CleanClean
MCP Tools

search_code

Codebase-wide semantic search that understands what code does, not just what it's named.

search_code is the core tool. It understands what code does, not just what it's named, and returns complete source, file paths, line numbers, the call graph (callers + callees), and neighbouring functions — in one call. It replaces grep, find, glob, and manual file reading for understanding code.

Inputs

NameTypeRequiredDefaultDescription
querystringNatural-language description of the behaviour you're looking for. Describe what the code does, not its identifier.
repostringautoRepository in owner/repo format. Omit to auto-select when only one repo is indexed, or to use the server's pinned default.
branchstringdefaultGit branch to search. Only needed if you indexed multiple branches of the same repo.
cwdstringAbsolute path to the user's working directory. When provided, the server runs git branch there to auto-detect which branch to search.
top_kinteger5Number of results to return. Use 3 for a targeted lookup, 10–15 for broad exploration. Clamped to 50.
depthinteger1How far to expand context around the top result. 0 = just matches, 1 = direct callers/callees, 2 = two levels. Clamped to 3.

Writing good queries

Describe behaviour, not names:

GoodBad
function that validates email format before signupvalidateEmail
middleware that checks authentication on API routesauthMiddleware
error handling in payment processingtry catch payment

What you get back

Results are returned as a tiered summary, so the response stays small:

  • Rank #1 — location, line count, signature, up to 4 docstring lines, and the full call graph (CALLS →, CALLED BY ←, SAME FILE).
  • Ranks #2–5 — location, signature, and a short docstring snippet. No call graph.
  • Ranks #6+ — the most compact form: location + signature only.

Each result carries a normalised relevance label (Strong, Good, Moderate, Weak) so the agent can stop early. Full source for any truncated result is one expand_result call away.

Behind the scenes, search_code uses hybrid retrieval: if your query contains identifier-shaped tokens (PascalCase, camelCase, snake_case, dotted paths, file fragments), it blends semantic similarity with direct name/path matches so an exact name floats to the top. Plain natural-language queries fall back to pure semantic search. See How Clean reduces cost.

Example

"Find the function that handles login redirects"

The agent calls:

{
  "name": "search_code",
  "arguments": {
    "query": "function that handles login redirects after authentication",
    "top_k": 5,
    "depth": 1
  }
}

and receives a ranked, tiered summary with rank #1's callers and callees attached.

Notes

  • If the index is stale (the underlying repo changed), the local edition fires a non-blocking re-index and returns results against the current index immediately — you're never blocked waiting for re-embedding.
  • Search has a 30-second timeout; simplify the query if you hit it.
  • A footer reminds the agent that rank #1 has the most detail and to only search again for a genuinely different concept.

On this page