Multi-source knowledge graph engine with schema-guided extraction, provenance tracking, confidence scoring, and quality gates — the engine powering structured AI knowledge.
Nodes are facts — concepts, operators, frameworks, cases. Edges are relationships — teaches, extends, illustrates. Confidence encodes trust. Click any node to see how traversal discovers connected knowledge.
Node color = type · brightness = confidence · click to traverse
Graph databases store and query relationships. h2t-graphs goes further — schema-guided extraction, enrichment pipeline, quality gates, and LLM-optimized output. Your agent gets answers, not rows.
3-layer ownership model. The engine knows nothing about Houdini, TouchDesigner, or creative thinking. It stores, scores, queries, and validates. Consumers bring domain data and get domain answers.
graph LR
subgraph CLIENT ["CLIENT LAYER"]
D1["h2t-ai
DCC Operators"] & D2["creative-thinking
Methodology"] & D3["h2t-transcription
Courses"]
end
subgraph ENGINE ["h2t-graphs ENGINE"]
E1["Store"] & E2["Score"] & E3["Search"] & E4["Validate"] & E5["Enrich"]
end
subgraph STUDIO ["GRAPH STUDIO (future)"]
S1["Templates"] & S2["LLM Copilot"] & S3["ETL Pipeline"]
end
CLIENT -->|"data + feedback"| ENGINE
ENGINE -->|"answers"| CLIENT
STUDIO -->|"schema + enrichment"| ENGINE
style ENGINE fill:#1a0a10,stroke:#e94560,color:#e94560
style CLIENT fill:#0a1a0d,stroke:#00ff88,color:#c0c0d0
style STUDIO fill:#0a0d1a,stroke:#4a9eff,color:#c0c0d0
Currently serving: DCC operators (4,363), creative methodology (901), course transcripts (9,379), TRIZ principles, design thinking
Template-first approach: strict typed constraints → pre-filter invalid pairs → guided reasoning LLM → judge with traceability → quality gate. Without template constraints, LLM accuracy = 40%. With template + guided reasoning = 68%, target 85%+.
Name patterns, bridges, structural edges. No LLM needed — free and fast.
Cosine similarity above threshold. Gemini embeddings surface potential relationships.
Template constraints reject invalid pairs before LLM. Guided reasoning (5-step) classifies the rest.
4-step validation with parsed traceability. Routes: accept, reject, or flag for human review.
Sample review by domain expert. Calibrates LLM judge accuracy.
Health score, orphan rate, modularity checks. Pass → promote to production.
Strict templates — typed allowed_pairs constraint matrix per relation type
Decision log — JSONL trace of every decision: prompt, response, routing
Schema proposals — unknown relations quarantined, not silently dropped
Pre-filter — structurally invalid pairs rejected before LLM (free)
Keyword precision, semantic understanding, or both. Edge-enriched embeddings capture graph context, not just node text.
Case-insensitive text match across configurable fields. Weighted scoring: relevance × source weight × confidence. Fast, precise, deterministic.
Gemini embeddings (768d) with HNSW index. Edge-enriched — each node's embedding includes its graph neighborhood. Finds concepts even when wording doesn't match.
Keyword and semantic merged with score normalization. Optional LLM query expansion adds synonyms before embedding. Best of both worlds.
BFS from any node with depth and type filters. Confidence-gated — traverse only through trusted nodes. Python runtime, not database-locked.
Not all knowledge is equal. LLM-generated content starts at low confidence. Expert-verified content earns trust. The engine tracks the journey.
graph LR
M["MODEL
confidence: 0.3"] -->|"human review"| S["SESSION
confidence: 0.6"]
S -->|"expert validation"| E["EXPERT
confidence: 0.9"]
E -->|"feedback refines"| S
style M fill:#0e0e14,stroke:#3a3a50,color:#a0a0b8
style S fill:#0a0d1a,stroke:#4a9eff,color:#4a9eff
style E fill:#0a1a0d,stroke:#00ff88,color:#00ff88,stroke-width:2px
Source types: expert (0.8–1.0) · docs (0.9) · session (0.6–0.8) · model (0.3–0.5). Queries filter by min_confidence. Feedback adjusts scores in real time.
On-write embedding — every new node is automatically vectorized with edge context. No batch jobs, no waiting. The node is semantically searchable the moment it's written.
# Write a node via API
POST /api/nodes
{
"source": "my-domain",
"node": {"id": "insight_42", "node_type": "insight", "label": "Pattern discovered", ...}
}
→ {"status": "ok", "node_id": "insight_42", "embedded": true}
# Immediately searchable
GET /api/query?source=my-domain&semantic=pattern+discovery
→ results include insight_42
Without schema constraints, LLM extraction produces 40–76% accuracy. Schema-guided extraction with typed constraints changes the game.
| Tool | Schema | Enrichment | Quality Gates | Provenance | Runtime |
|---|---|---|---|---|---|
| Neo4j | No | No | No | No | Graph DB |
| GraphRAG | Optional | Extract only | No | No | Library |
| LightRAG | No | No | No | No | Library |
| KG-Gen | Optional | No | No | No | Library |
| h2t-graphs | Default (strict) | 6-phase pipeline | Yes | Yes | API + Engine |
Differentiator: Schema-guided extraction as default (not optional), with provenance, enrichment pipeline, and quality gates in one package.
h2t-graphs is the persistence layer in a larger feedback loop. Evaluation, optimization, and continuous learning flow through the graph.
graph LR
subgraph CLIENTS ["CLIENTS"]
C1["DCC Skill"] & C2["Creative"] & C3["Transcription"]
end
subgraph GRAPHS ["h2t-graphs"]
G1["Store"] & G2["Search"] & G3["Enrich"] & G4["Score"]
end
subgraph EVALS ["h2t-evals"]
V1["GT Packs"] & V2["Judge"] & V3["Optimizer"]
end
CLIENTS -->|"query + feedback"| GRAPHS
GRAPHS -->|"metrics"| EVALS
EVALS -->|"improve"| CLIENTS
style GRAPHS fill:#1a0a10,stroke:#e94560,color:#e94560
style CLIENTS fill:#0a1a0d,stroke:#00ff88,color:#c0c0d0
style EVALS fill:#0a0d1a,stroke:#4a9eff,color:#c0c0d0
From template-first enrichment to a full Graph Studio with LLM copilot.
Strict templates with typed allowed_pairs constraint matrix. Pre-filter rejects invalid pairs before LLM. Guided reasoning prompts (5-step classifier, 4-step judge). JSONL decision log. Schema proposals quarantine. Multi-rubric validation with SGR gates.
Generalize TD template to 3+ domains. Template validation and versioning. Cross-domain template composition.
LLM Schema Copilot — 5-7 questions → template stack. Multi-layer graph support. Cross-layer linking. 10 graph type families.
Lightweight. No Kubernetes. No managed services. One VPS, auto-TLS, full control. 601 tests. CI/CD: push main → deploy.
All /api/ endpoints require an X-H2T-Token header. Documentation and this page are public.
Python (recommended):
pip install git+https://github.com/lichtpfad/h2t-client.git
from h2t_client import GraphsClient
client = GraphsClient() # auto-reads token from ~/.dor/secrets.env
results = client.search("td", "noise generation", semantic=True)
curl:
curl -H "X-H2T-Token: YOUR_TOKEN" \
"https://graphs.lichtpfadstudio.com/api/query?source=creative&semantic=инверсия"
For API access, contact @prcdrl on Telegram.