Theory: From RAG to HippoRAG

Why does your retrieval system fail when the answer spans multiple documents?

The short version: vector similarity finds documents that look like your query. But sometimes you need documents that are connected to your query through concepts the embedding never learned to represent.

This is the multi-hop problem. And the fix is surprisingly intuitive once you see it.

The idea in 30 seconds

Standard RAG embeds documents as vectors and retrieves by similarity. Works great for single-hop questions ("What is X?"). Falls apart for multi-hop questions ("What happens when X interacts with Y in context Z?").

The fix: build a graph of concepts and relationships alongside your embeddings. When a query arrives, match it to concepts in the graph, then spread activation through connected nodes. Retrieve documents linked to the high-scoring nodes.

That's HippoRAG, named after your hippocampus, which does exactly this for memory retrieval.

qortex builds the graph. These tutorials explain why it works.

The series

Tutorial What it covers
The Multi-Hop Problem Why similarity isn't association
Knowledge Graphs 101 Concepts, edges, semantic types
The Projection Pipeline Graph → Rules via Source → Enricher → Target
The Consumer Loop Rules as hypotheses; measuring what works
Pattern Completion Personalized PageRank and spreading activation
HippoRAG First Principles The full algorithm: index with graphs, retrieve with PPR

What these tutorials are (and aren't)

These are intentionally light. You'll get working intuition, enough to use qortex and understand what it's doing. You won't get rigorous math or deep theory.

For the full treatment (probability from first principles, the linear algebra behind PageRank, information geometry for embeddings), there's Aegir. It's an in-progress curriculum I'm building alongside my own learning journey. Think of it as a super-notebook that'll become a proper book over the next year or two.

Prerequisites

  • Python basics (functions, classes, dicts)
  • Comfort with pip install and running scripts
  • No ML/AI background required

Ready?

Start with The Multi-Hop Problem, a 2am hospital story about a two-million-dollar system that couldn't answer a simple question.