How Clean Reduces Cost

The seven cooperating mechanisms Clean uses to minimise the tokens an AI agent spends understanding your code.

When an AI agent works on a codebase, the dominant cost is its context window — the input tokens it reads, the output tokens it writes, and the round-trips that re-send the whole conversation on every turn. Clean is engineered end to end to deliver the most useful understanding per token. It does this with seven cooperating mechanisms, each attacking a different source of waste.

Clean's internal accounting uses the standard approximation tokens ≈ characters / 4, and every search measures its own savings (see get_token_savings).

1. Retrieval instead of exploration

The single largest lever. Without a semantic index, an agent answering "where is login handled?" runs grep, then reads a dozen whole files — most of them irrelevant — just to locate the code. Clean replaces that loop with one search_code call that returns exactly the right functions, ranked, with file paths, line numbers, and call-graph edges already attached. Because LanceDB stores each entity's location and source next to its vector, the answer to "where is this?" travels back with the match — no second filesystem step. Every file not read is pure savings.

2. Tiered result formatting

Clean never dumps full source in a search response. It returns a tiered summary that spends tokens in proportion to how likely each result is the one you want:

Tier	Ranks	Included
1	#1	Location, signature, up to 4 docstring lines, and full call graph.
2	#2–5	Location, signature, short docstring. No call graph.
3	#6+	Location + signature only.

A 300-line function that would cost 300 lines if dumped costs ~3–8 lines as a Tier-1 summary. Normalised relevance labels (Strong/Good/Moderate/Weak) help the agent stop early.

3. On-demand expansion

Tiered formatting withholds full source; the agent gets it back only when it asks, via expand_result. After each search the full result set is cached by rank, so expansion is a cheap, exact disk read — no re-embedding or re-searching. You pay the full-source cost only for the handful of results you truly need (often just rank #1). get_source applies the same discipline to arbitrary files, capping reads at 500 lines (2000 for a named function).

4. TOON encoding & measurement

A JSON array repeats every field name on every row. TOON (Token-Optimized Object Notation) hoists field names into a single header and emits aligned columns, paying for each key once — a 30–40% token saving versus equivalent JSON. On every search Clean formats the results twice (compact + a full-JSON baseline) and banks the character-count delta, so the headline "tokens saved" figure is a measured fact, not an estimate. Read the running total with get_token_savings.

5. Hybrid retrieval precision

Every unnecessary follow-up search is a full round-trip. Pure embeddings are great at behaviour but can rank an exact identifier lower than expected. Clean detects identifier-shaped tokens in the query (PascalCase, camelCase, snake_case, dotted paths, file fragments) and, when present, fuses semantic similarity with direct name and path matches:

semantic match  → score × 0.6
name match      → score + 0.3
path match      → score + 0.1

An entity that's both a strong semantic and name match floats decisively to rank 1 — so the agent finds it first try and skips a second search. Natural-language queries fall straight back to pure semantic search.

6. Batched context expansion

Attaching the top result's call graph pre-answers "what does this call?" and "who calls this?". Gathering it naively is O(branching^depth) queries; Clean issues one batched query per depth level (O(depth)), tracks visited nodes to handle cycles, and caps each direction at 50 entities. Fewer round-trips, bounded response.

7. Incremental indexing and staleness

Re-embedding an entire repo on every change wastes compute. Clean's incremental indexer classifies every file as added / modified / deleted / unchanged — using a git diff when available, falling back to content hashing — and only re-embeds what changed. On every search_code, a staleness check (git HEAD + git status, or mtime) decides whether the index is out of date; if it is, the re-index fires fire-and-forget and the search proceeds immediately against the existing index. The agent is never blocked waiting for re-embedding.

A worked example

For "How does email validation work on signup?" in a 50,000-line repo:

	Tokens to locate the code
Without Clean — grep + read 5–6 whole files + re-grep	~13,400
With Clean — one `search_code` + one `expand_result`	~1,050

The order-of-magnitude difference comes from stacking the mechanisms — and get_token_savings lets you watch the real numbers accrue.

Tuning knobs

Knob	Default	Effect
`top_k`	5 (max 50)	More results = larger response. Use 3 for targeted, 10–15 for exploration.
`depth`	1 (max 3)	Deeper call-graph context in Tier 1.
`get_source` cap	500 lines (2000 for `function=`)	Prevents whole-file inhalation.
Embedding model	`all-MiniLM-L6-v2`	A smaller model indexes faster and cheaper.

How Clean Reduces Cost

On this page