Theory: From RAG to HippoRAG¶
Why does your retrieval system fail when the answer spans multiple documents?
The short version: vector similarity finds documents that look like your query. But sometimes you need documents that are connected to your query through concepts the embedding never learned to represent.
This is the multi-hop problem. And the fix is surprisingly intuitive once you see it.
The idea in 30 seconds¶
Standard RAG embeds documents as vectors and retrieves by similarity. Works great for single-hop questions ("What is X?"). Falls apart for multi-hop questions ("What happens when X interacts with Y in context Z?").
The fix: build a graph of concepts and relationships alongside your embeddings. When a query arrives, match it to concepts in the graph, then spread activation through connected nodes. Retrieve documents linked to the high-scoring nodes.
That's HippoRAG, named after your hippocampus, which does exactly this for memory retrieval.
qortex builds the graph. These tutorials explain why it works.
The series¶
| Tutorial | What it covers |
|---|---|
| The Multi-Hop Problem | Why similarity isn't association |
| Knowledge Graphs 101 | Concepts, edges, semantic types |
| The Projection Pipeline | Graph → Rules via Source → Enricher → Target |
| The Consumer Loop | Rules as hypotheses; measuring what works |
| Pattern Completion | Personalized PageRank and spreading activation |
| HippoRAG First Principles | The full algorithm: index with graphs, retrieve with PPR |
What these tutorials are (and aren't)¶
These are intentionally light. You'll get working intuition, enough to use qortex and understand what it's doing. You won't get rigorous math or deep theory.
For the full treatment (probability from first principles, the linear algebra behind PageRank, information geometry for embeddings), there's Aegir. It's an in-progress curriculum I'm building alongside my own learning journey. Think of it as a super-notebook that'll become a proper book over the next year or two.
Prerequisites¶
- Python basics (functions, classes, dicts)
- Comfort with
pip installand running scripts - No ML/AI background required
Ready?¶
Start with The Multi-Hop Problem, a 2am hospital story about a two-million-dollar system that couldn't answer a simple question.