How Clean Reduces Cost
The seven cooperating mechanisms Clean uses to minimise the tokens an AI agent spends understanding your code.
When an AI agent works on a codebase, the dominant cost is its context window — the input tokens it reads, the output tokens it writes, and the round-trips that re-send the whole conversation on every turn. Clean is engineered end to end to deliver the most useful understanding per token. It does this with seven cooperating mechanisms, each attacking a different source of waste.
Clean's internal accounting uses the standard approximation tokens ≈ characters / 4, and every search measures its own savings (see get_token_savings).
1. Retrieval instead of exploration
The single largest lever. Without a semantic index, an agent answering "where is login handled?" runs grep, then reads a dozen whole files — most of them irrelevant — just to locate the code. Clean replaces that loop with one search_code call that returns exactly the right functions, ranked, with file paths, line numbers, and call-graph edges already attached. Because LanceDB stores each entity's location and source next to its vector, the answer to "where is this?" travels back with the match — no second filesystem step. Every file not read is pure savings.
2. Tiered result formatting
Clean never dumps full source in a search response. It returns a tiered summary that spends tokens in proportion to how likely each result is the one you want:
| Tier | Ranks | Included |
|---|---|---|
| 1 | #1 | Location, signature, up to 4 docstring lines, and full call graph. |
| 2 | #2–5 | Location, signature, short docstring. No call graph. |
| 3 | #6+ | Location + signature only. |
A 300-line function that would cost 300 lines if dumped costs ~3–8 lines as a Tier-1 summary. Normalised relevance labels (Strong/Good/Moderate/Weak) help the agent stop early.
3. On-demand expansion
Tiered formatting withholds full source; the agent gets it back only when it asks, via expand_result. After each search the full result set is cached by rank, so expansion is a cheap, exact disk read — no re-embedding or re-searching. You pay the full-source cost only for the handful of results you truly need (often just rank #1). get_source applies the same discipline to arbitrary files, capping reads at 500 lines (2000 for a named function).
4. TOON encoding & measurement
A JSON array repeats every field name on every row. TOON (Token-Optimized Object Notation) hoists field names into a single header and emits aligned columns, paying for each key once — a 30–40% token saving versus equivalent JSON. On every search Clean formats the results twice (compact + a full-JSON baseline) and banks the character-count delta, so the headline "tokens saved" figure is a measured fact, not an estimate. Read the running total with get_token_savings.
5. Hybrid retrieval precision
Every unnecessary follow-up search is a full round-trip. Pure embeddings are great at behaviour but can rank an exact identifier lower than expected. Clean detects identifier-shaped tokens in the query (PascalCase, camelCase, snake_case, dotted paths, file fragments) and, when present, fuses semantic similarity with direct name and path matches:
An entity that's both a strong semantic and name match floats decisively to rank 1 — so the agent finds it first try and skips a second search. Natural-language queries fall straight back to pure semantic search.
6. Batched context expansion
Attaching the top result's call graph pre-answers "what does this call?" and "who calls this?". Gathering it naively is O(branching^depth) queries; Clean issues one batched query per depth level (O(depth)), tracks visited nodes to handle cycles, and caps each direction at 50 entities. Fewer round-trips, bounded response.
7. Incremental indexing and staleness
Re-embedding an entire repo on every change wastes compute. Clean's incremental indexer classifies every file as added / modified / deleted / unchanged — using a git diff when available, falling back to content hashing — and only re-embeds what changed. On every search_code, a staleness check (git HEAD + git status, or mtime) decides whether the index is out of date; if it is, the re-index fires fire-and-forget and the search proceeds immediately against the existing index. The agent is never blocked waiting for re-embedding.
A worked example
For "How does email validation work on signup?" in a 50,000-line repo:
| Tokens to locate the code | |
|---|---|
| Without Clean — grep + read 5–6 whole files + re-grep | ~13,400 |
With Clean — one search_code + one expand_result | ~1,050 |
The order-of-magnitude difference comes from stacking the mechanisms — and get_token_savings lets you watch the real numbers accrue.
Tuning knobs
| Knob | Default | Effect |
|---|---|---|
top_k | 5 (max 50) | More results = larger response. Use 3 for targeted, 10–15 for exploration. |
depth | 1 (max 3) | Deeper call-graph context in Tier 1. |
get_source cap | 500 lines (2000 for function=) | Prevents whole-file inhalation. |
| Embedding model | all-MiniLM-L6-v2 | A smaller model indexes faster and cheaper. |