Part 6: HippoRAG First Principles¶

You've seen the pieces. Now let's put them together.

HippoRAG is a retrieval system that mimics how your hippocampus indexes and retrieves memories. Instead of treating documents as isolated vectors, it builds a graph of associations and retrieves by spreading activation.

The two phases¶

Offline (Indexing): Build the knowledge graph from your documents.

Online (Retrieval): Given a query, spread activation through the graph to find relevant passages.

Phase 1: Indexing¶

Documents
    ↓
[Extract triples via LLM]
    ↓
("Metformin", "TREATS", "Diabetes")
("Metformin", "RISK_WITH", "Renal Impairment")
    ↓
[Build knowledge graph]
    ↓
Nodes: Metformin, Diabetes, Renal Impairment, ...
Edges: TREATS, RISK_WITH, ...
    ↓
[Link nodes to source passages]
    ↓
Metformin node → Passage 1, Passage 7
    ↓
[Embed nodes for matching]
    ↓
Ready for retrieval

The key insight: extraction creates discrete representations. "Metformin" is its own node, not a point in embedding space that might blur with similar drugs. This is pattern separation: distinct concepts get distinct representations.

Phase 2: Retrieval¶

Query: "metformin and kidney problems"
    ↓
[Extract query entities]
    ↓
["metformin", "kidney"]
    ↓
[Match to graph nodes via embedding similarity]
    ↓
Metformin node (0.95 match)
Renal Impairment node (0.87 match for "kidney")
    ↓
[Run Personalized PageRank from matched nodes]
    ↓
High scores: Metformin, Renal Impairment, Lactic Acidosis, ...
    ↓
[Rank passages by sum of their nodes' scores]
    ↓
Passage 3 (contains Metformin + Lactic Acidosis): score 1.7
Passage 1 (contains Metformin only): score 0.9
    ↓
[Return top passages to LLM]

The retrieval found Lactic Acidosis (a crucial concept for this query) even though the query never mentioned it. Pattern completion discovered the connection.

The brain analogy¶

Brain Component	HippoRAG	Function
Neocortex	Documents + LLM	Stores actual content
Hippocampal index	Knowledge graph	Network of associations
Pattern separation	Triple extraction	Distinct representations
Pattern completion	Personalized PageRank	Spread from partial cues

The hippocampus doesn't store memories. It indexes them. The neocortex stores the content; the hippocampus stores the associations between fragments.

HippoRAG does the same. The LLM and documents are the "neocortex." The knowledge graph is the "hippocampal index."

Why it beats standard RAG¶

Aspect	Standard RAG	HippoRAG
Representation	Dense vectors	Graph + embeddings
Retrieval	Nearest neighbor	PPR on graph
Multi-hop	Fails (needs iteration)	Single pass
Cost for multi-hop	10-20x more LLM calls	Same as single-hop
Speed for multi-hop	6-13x slower	Same as single-hop

Standard RAG asks: "What documents look like this query?"

HippoRAG asks: "What concepts connect to this query's concepts?"

The second question is the right question for multi-hop reasoning.

What qortex provides¶

qortex is the indexing layer. It builds the knowledge graph that HippoRAG retrieves from:

Ingest: Documents become structured manifests
Store: Concepts and edges go into GraphBackend
Project: Rules for downstream consumers (buildlog tests them)
Retrieve (Phase 2): PPR-based pattern completion

The scaffolding for retrieval exists in src/qortex/hippocampus/. Full implementation is Phase 2 of the roadmap.

What you learned¶

HippoRAG has two phases: offline indexing, online retrieval
Indexing extracts triples and builds a knowledge graph
Retrieval matches query to nodes, then spreads activation via PPR
Pattern completion finds relevant concepts the query never mentioned
qortex provides the indexing layer; retrieval is Phase 2

Next steps¶

Ready to use this? Head to the Quick Start to ingest your first content, build a graph, and project rules.