Part 6: HippoRAG First Principles¶
You've seen the pieces. Now let's put them together.
HippoRAG is a retrieval system that mimics how your hippocampus indexes and retrieves memories. Instead of treating documents as isolated vectors, it builds a graph of associations and retrieves by spreading activation.
The two phases¶
Offline (Indexing): Build the knowledge graph from your documents.
Online (Retrieval): Given a query, spread activation through the graph to find relevant passages.
Phase 1: Indexing¶
Documents
↓
[Extract triples via LLM]
↓
("Metformin", "TREATS", "Diabetes")
("Metformin", "RISK_WITH", "Renal Impairment")
↓
[Build knowledge graph]
↓
Nodes: Metformin, Diabetes, Renal Impairment, ...
Edges: TREATS, RISK_WITH, ...
↓
[Link nodes to source passages]
↓
Metformin node → Passage 1, Passage 7
↓
[Embed nodes for matching]
↓
Ready for retrieval
The key insight: extraction creates discrete representations. "Metformin" is its own node, not a point in embedding space that might blur with similar drugs. This is pattern separation: distinct concepts get distinct representations.
Phase 2: Retrieval¶
Query: "metformin and kidney problems"
↓
[Extract query entities]
↓
["metformin", "kidney"]
↓
[Match to graph nodes via embedding similarity]
↓
Metformin node (0.95 match)
Renal Impairment node (0.87 match for "kidney")
↓
[Run Personalized PageRank from matched nodes]
↓
High scores: Metformin, Renal Impairment, Lactic Acidosis, ...
↓
[Rank passages by sum of their nodes' scores]
↓
Passage 3 (contains Metformin + Lactic Acidosis): score 1.7
Passage 1 (contains Metformin only): score 0.9
↓
[Return top passages to LLM]
The retrieval found Lactic Acidosis (a crucial concept for this query) even though the query never mentioned it. Pattern completion discovered the connection.
The brain analogy¶
| Brain Component | HippoRAG | Function |
|---|---|---|
| Neocortex | Documents + LLM | Stores actual content |
| Hippocampal index | Knowledge graph | Network of associations |
| Pattern separation | Triple extraction | Distinct representations |
| Pattern completion | Personalized PageRank | Spread from partial cues |
The hippocampus doesn't store memories. It indexes them. The neocortex stores the content; the hippocampus stores the associations between fragments.
HippoRAG does the same. The LLM and documents are the "neocortex." The knowledge graph is the "hippocampal index."
Why it beats standard RAG¶
| Aspect | Standard RAG | HippoRAG |
|---|---|---|
| Representation | Dense vectors | Graph + embeddings |
| Retrieval | Nearest neighbor | PPR on graph |
| Multi-hop | Fails (needs iteration) | Single pass |
| Cost for multi-hop | 10-20x more LLM calls | Same as single-hop |
| Speed for multi-hop | 6-13x slower | Same as single-hop |
Standard RAG asks: "What documents look like this query?"
HippoRAG asks: "What concepts connect to this query's concepts?"
The second question is the right question for multi-hop reasoning.
What qortex provides¶
qortex is the indexing layer. It builds the knowledge graph that HippoRAG retrieves from:
- Ingest: Documents become structured manifests
- Store: Concepts and edges go into GraphBackend
- Project: Rules for downstream consumers (buildlog tests them)
- Retrieve (Phase 2): PPR-based pattern completion
The scaffolding for retrieval exists in src/qortex/hippocampus/. Full implementation is Phase 2 of the roadmap.
What you learned¶
- HippoRAG has two phases: offline indexing, online retrieval
- Indexing extracts triples and builds a knowledge graph
- Retrieval matches query to nodes, then spreads activation via PPR
- Pattern completion finds relevant concepts the query never mentioned
- qortex provides the indexing layer; retrieval is Phase 2
Next steps¶
Ready to use this? Head to the Quick Start to ingest your first content, build a graph, and project rules.