qortex-ingest¶
Pluggable document ingestion for qortex: extract concepts, relations, and rules from any source into a knowledge graph.
Install¶
pip install qortex-ingest
With extraction backends:
pip install "qortex-ingest[anthropic]" # Claude API extraction
pip install "qortex-ingest[pdf]" # PDF support (pymupdf + pdfplumber)
pip install "qortex-ingest[all]" # everything
Quick Start¶
from qortex.ingest import IngestionManifest
from qortex.ingest.text import TextIngestor
from qortex.ingest.backends import get_extraction_backend
# Auto-detect best available backend (Anthropic > Ollama > Stub)
backend = get_extraction_backend()
ingestor = TextIngestor(backend=backend)
manifest: IngestionManifest = ingestor.ingest(
source_path="notes.txt",
domain="my-project",
)
print(f"Extracted {len(manifest.concepts)} concepts, {len(manifest.edges)} relations")
What It Does¶
qortex-ingest converts documents into structured knowledge graph components:
- Chunk — Split source by format (paragraphs, headings, sentences)
- Extract — Two-pass LLM extraction: generalizable concepts, then illustrative examples reconciled onto parents
- Relate — 10 relation types:
REQUIRES,USES,REFINES,IMPLEMENTS,PART_OF,SIMILAR_TO,ALTERNATIVE_TO,SUPPORTS,CHALLENGES,CONTRADICTS - Assemble — Output a single
IngestionManifest(the universal contract)
Ingestors¶
| Ingestor | Format | Chunking Strategy |
|---|---|---|
TextIngestor |
Plain text | Fixed-size with configurable overlap |
MarkdownIngestor |
Markdown | By heading hierarchy, preserves structure |
SentenceBoundaryChunker |
Online/streaming | Regex sentence boundaries, SHA256 IDs |
Pluggable Chunkers¶
Any callable matching ChunkingStrategy can replace the default:
from qortex.online.chunker import Chunk
def my_chunker(
text: str,
max_tokens: int = 256,
overlap_tokens: int = 32,
source_id: str = "",
) -> list[Chunk]:
# Your custom chunking logic (tiktoken, semantic, etc.)
...
Extraction Backends¶
| Backend | Cost | Features |
|---|---|---|
AnthropicExtractionBackend |
~$0.60/57KB | Full extraction: concepts, relations, rules, code examples |
OllamaExtractionBackend |
Free (local) | Concepts, relations, rules (no code examples) |
StubLLMBackend |
Free | Testing only — returns configured fixtures |
Auto-detection priority: Anthropic (if ANTHROPIC_API_KEY set) > Ollama (if reachable) > Stub.
Output: IngestionManifest¶
The manifest is the universal contract between ingestion and the knowledge graph:
@dataclass
class IngestionManifest:
source: SourceMetadata # origin info + stats
domain: str # knowledge domain name
concepts: list[ConceptNode] # extracted concepts with embeddings
edges: list[ConceptEdge] # typed relations between concepts
rules: list[ExplicitRule] # best practices, warnings, principles
code_examples: list[CodeExample] # linked to concepts and rules
Requirements¶
- Python 3.11+
qortex(for core models —IngestionManifest,ConceptNode, etc.)anthropic(optional, for Claude extraction)pymupdf+pdfplumber(optional, for PDF support)
License¶
MIT