Architecture
A high-level tour of how Super-Brain turns raw code into a queryable knowledge graph.
The five layers
Super-Brain is organized as five cooperating layers. Each has a single job and a clean contract with the next.
┌──────────────────────────────────────────────────┐
│ 5. Interface CLI · MCP server · Reports │
├──────────────────────────────────────────────────┤
│ 4. Intelligence Hybrid retriever · LLM · Tools│
├──────────────────────────────────────────────────┤
│ 3. Storage KùzuDB graph + Qdrant vectors │
├──────────────────────────────────────────────────┤
│ 2. Extraction Deterministic AST parsers │
├──────────────────────────────────────────────────┤
│ 1. Preprocessing Tree-sitter · MarkItDown │
│ · faster-whisper · yt-dlp │
└──────────────────────────────────────────────────┘
Everything below layer 4 is deterministic. No LLM calls happen during ingestion.
Layer 1 — Preprocessing
Job: turn raw inputs (code files, documents, audio) into parsed intermediate representations.
- Code — tree-sitter with tree-sitter-language-pack. 306 languages, one consistent AST interface.
- Documents — MarkItDown converts PDF, DOCX, PPTX, MD, TXT, HTML into normalized markdown with headings preserved.
- Audio / video — yt-dlp fetches remote sources; FFmpeg normalizes to WAV; faster-whisper transcribes locally with word-level timestamps.
No network calls happen here unless the user explicitly asks for a YouTube URL. Transcription runs entirely on-device.
Layer 2 — Extraction
Job: turn parsed trees and transcripts into graph-ready entities and edges.
- Rule-based AST queries produce Function, Method, and Class nodes along with their call edges. Rules are language-specific for tier-1 languages and generic for the other 290+.
- Import resolution links symbols across files by resolving relative and absolute imports.
- Document extraction breaks markdown into Section nodes, one per heading, preserving the heading hierarchy.
- Audio extraction chunks transcripts into Transcript nodes with start/end timestamps.
Every extraction rule is deterministic. Given the same input, you get identical output every run — which means you can commit the graph, diff it, and trust it.
Layer 3 — Storage
Job: persist extracted nodes and edges; serve them to the query layer with low latency.
Two stores work together:
KùzuDB — the graph
KùzuDB is an embedded graph database. No server, no daemon. The whole database is a file (or a directory of files) in your project. Supports Cypher, ACID transactions, and multi-gigabyte graphs on commodity hardware.
Node types: Module, Function, Class, Document, Section, Concept, Audio, Transcript.
Edge types: CALLS, DEFINES, IMPORTS, CONTAINS, REFERENCES, TRANSCRIBES, LINKED_TO.
Qdrant — the vector store
Local-mode Qdrant, same story: embedded, file-backed, no server required. Holds 384-dimensional embeddings produced by a local sentence-transformer model (default: all-MiniLM-L6-v2).
Every node that has meaningful text (function body, doc section, transcript) is embedded and cross-indexed by node_id with the graph.
Layer 4 — Intelligence
Job: answer questions by combining semantic similarity with structural traversal.
Hybrid retrieval
- Semantic seed: vector-search the query against Qdrant. Returns top-K candidates ranked by cosine similarity.
- Graph expansion: for each seed, walk the graph outward (
CALLS,LINKED_TO,REFERENCES) to a configurable depth. Each hop applies a decay factor so structurally-distant nodes don't swamp the result. - Filter and rank: exclude external stubs, merge duplicates, sort by decayed score.
The result is a ranked list of nodes with source paths, line numbers, and confidence — enough evidence to cite exactly why the answer is what it is.
Optional local LLM
For rich natural-language answers, an optional llama.cpp integration runs Llama-3.2-1B locally (~700 MB model file, downloaded once). The LLM receives only the ranked evidence bundle — never the whole codebase — and composes the prose.
Turning the LLM off gives you pure deterministic evidence. Turning it on adds phrasing, not facts.
Layer 5 — Interface
Job: expose the intelligence layer to humans and other tools.
Three entry points:
- CLI —
agsuperbrain ask,search-vectors,query,inspect-function, and everything else in the CLI Reference. - MCP server —
agsuperbrain mcp-serveover stdio JSON-RPC. Ten tools:search_code,find_callers,find_callees,get_function_body,path_between,closure,get_subgraph,stats,list_modules,list_functions. - Reports —
agsuperbrain reportwrites aGRAPH_REPORT.mdwith god-nodes, cross-module dependencies, orphans, and suggested follow-up questions.initand the background watcher regenerate.agsuperbrain/GRAPH_REPORT.mdand.agsuperbrain/graph.htmlautomatically after every re-index.
All three read from the same graph and vector store. There is no second copy of anything.
Data flow end to end
Source files ──► tree-sitter ──► Function/Call nodes ──┐
├──► KùzuDB ──┐
PDFs / docs ──► MarkItDown ──► Section nodes ──┤ │
│ ├──► Hybrid retriever ──► MCP / CLI
Audio / video ─► Whisper ──► Transcript nodes ──┘ │
│
Node text ──► sentence-transformers ──► 384-dim vectors ──► Qdrant ┘
Ingestion is one-way and additive. Query-time reads are cheap. The file watcher re-runs the ingestion pipeline only against changed files, keeping the graph fresh without a full rebuild.
Determinism guarantees
The architecture is built to keep surprises out:
- No LLM in ingestion. Call edges, class hierarchies, imports — all come from AST walks. An LLM cannot invent a relationship that isn't in the code.
- Stable node IDs. IDs are derived from source path + qualified name, so re-ingesting the same code produces the same graph.
- Transactional writes. KùzuDB is ACID. A crash mid-ingest leaves the previous graph intact.
- Idempotent upserts. Re-running
ingeston the same files updates changed bodies and leaves everything else alone.
Performance characteristics
Measured on a self-corpus pilot (the agsuperbrain repository itself, 131 files, 86 Tier-1 source files; x86_64, 8 vCPU under WSL2, ext4). Reproducible via paper/evaluation/run_eval.py.
Query latency (n=30 per primitive, 3 warmup):
| Operation | p50 | p95 |
|---|---|---|
find_callers (Cypher graph traversal) |
2.0 ms | 2.5 ms |
find_callees (Cypher graph traversal) |
2.3 ms | 3.1 ms |
closure (depth 3, transitive) |
4.3 ms | 7.4 ms |
embed_only (warm sentence-transformer) |
7.1 ms | 9.0 ms |
search_code (hybrid: vector seed + graph expansion) |
491 ms | 781 ms |
Cold-start cost (one-time, then incremental):
| Phase | Wall time |
|---|---|
| Code ingest (Pass A indexing + Pass B call resolution) | 488.7 s |
Document ingest (.md, .html, .pdf, .docx, .pptx) |
34.5 s |
| Vector indexing (2,973 nodes, 384-dim) | 109.4 s |
Storage footprint: 22.9 MiB total on disk (KùzuDB graph + Qdrant vector store) for 5.4 MiB of indexed source/doc content.
Steady state: the file watcher debounces changes (default 400 ms) and re-indexes only the affected files — typically milliseconds per change. Pass-B call resolution on large repos is the primary candidate for optimisation in upcoming releases.
Latency and throughput vary with file size, language, and hardware. Pilot scope and full caveats are in the paper.
What's intentionally out of scope
Super-Brain is not:
- A code generator — it indexes code, it doesn't write code
- A cloud service — there is no Super-Brain server to log into
- A replacement for your assistant — it makes your assistant smarter, not redundant
- A type checker — it extracts structural facts, not type correctness
Next steps
- CLI Reference — commands built on this architecture
- IDE Integration — how the architecture plugs into Claude Code / Cursor / Aider / etc.
- Why Super-Brain — what this architecture solves that standalone assistants can't