ENGINEERING22 May 20265 MIN READ

PuttinganLLMInsideaSystemUtility

A disk-usage scan gives you a wall of folders sorted by size. MacGuard pairs that scan with Claude Haiku to turn it into a short list of things actually worth doing.

Every disk analyser shows you the same thing: a wall of folders sorted by size. It is accurate, and it is not very helpful. Knowing that Library is large does not tell you what to do about it. When I built the AI Disk Analyzer inside MacGuard, the goal was to close that gap, to turn a scan into a short list of actions worth taking.

Why reach for a model at all

Disk cleanup is mostly judgement, not maths. A 4 GB folder of forgotten downloads is safe to clear. A 4 GB folder of caches will simply rebuild itself. A 4 GB application you use daily should obviously be left alone. The sizes are identical; the right action is completely different.

That judgement is exactly the sort of fuzzy, context-heavy reasoning a language model is good at and a sort function is not. So MacGuard does the deterministic part itself, walking the disk and measuring everything, and hands the judgement to a model.

Haiku, because the job is small and frequent

I used Claude Haiku for this. The reasoning involved is not heavy, the prompt is compact, and people will run a cleanup more than once. A fast, inexpensive model is the right fit. Spending a slow, costly request on "is this cache safe to clear" would be using a sledgehammer on a nut.

Send the shape of the disk, not the disk

This is the part that matters most. The model never sees your files. It does not need them, and it would be wrong to send them.

MacGuard's traversal engine produces a compact summary first: aggregated sizes, file types, locations, ages, and obvious patterns like duplicate clusters or stale downloads. That summary, not raw file contents, is what goes to the model. It keeps the request small enough to stay fast and cheap, and it means a disk analyser never ships your personal data off the machine to do its job.

Ask for structure, not prose

A friendly paragraph about your disk is useless to a UI. The analyser asks the model for a structured response: a ranked list of recommendations, each with a category, an estimate of the space involved, and a short plain-English reason.

Structured output is what lets the AI layer slot cleanly into a native app. Each recommendation becomes a row, sorted so the safe, high-impact wins land at the top. The model supplies the judgement; the app stays firmly in control of how that judgement is shown and acted on.

Keep the human in the loop

One firm rule: the analyser recommends, it never deletes. Every suggestion is something you choose to act on. An LLM is excellent at triage and genuinely capable of being confidently wrong, so it earns a place advising the decision, not making it.

That is the pattern I keep coming back to for AI features in real software. Do the precise, verifiable work in code. Use the model for the judgement call in the middle. And always leave the final action with the person whose disk it actually is.