Schema-Guided Knowledge Engine

Your agents need knowledge,
not another database

Multi-source knowledge graph engine with schema-guided extraction, provenance tracking, confidence scoring, and quality gates — the engine powering structured AI knowledge.

API Documentation Get Access

22K+

Nodes

50K+

Edges

Sources

601

Tests

Endpoints

how it works

What is a knowledge graph?

Nodes are facts — concepts, operators, frameworks, cases. Edges are relationships — teaches, extends, illustrates. Confidence encodes trust. Click any node to see how traversal discovers connected knowledge.

Node color = type · brightness = confidence · click to traverse

positioning

Not a graph database. A knowledge engine.

Graph databases store and query relationships. h2t-graphs goes further — schema-guided extraction, enrichment pipeline, quality gates, and LLM-optimized output. Your agent gets answers, not rows.

Graph Database (Neo4j, ArangoDB)

Store nodes and edges
Cypher / AQL queries
Returns raw graph data
No opinion on quality or trust
Schema optional — no extraction
Agent must interpret results

h2t-graphs (Knowledge Engine)

Schema-guided extraction as default path
Provenance on every node — who said it, how reliable
6-phase enrichment pipeline with quality gates
Semantic + keyword + hybrid search in one call
LLM-optimized output — scored, ranked, token-estimated
Built specifically for AI agent consumption

architecture

Engine owns infrastructure. Domain owns knowledge.

3-layer ownership model. The engine knows nothing about Houdini, TouchDesigner, or creative thinking. It stores, scores, queries, and validates. Consumers bring domain data and get domain answers.

graph LR
    subgraph CLIENT ["CLIENT LAYER"]
        D1["h2t-ai
DCC Operators"] & D2["creative-thinking
Methodology"] & D3["h2t-transcription
Courses"]
    end
    subgraph ENGINE ["h2t-graphs ENGINE"]
        E1["Store"] & E2["Score"] & E3["Search"] & E4["Validate"] & E5["Enrich"]
    end
    subgraph STUDIO ["GRAPH STUDIO (future)"]
        S1["Templates"] & S2["LLM Copilot"] & S3["ETL Pipeline"]
    end
    CLIENT -->|"data + feedback"| ENGINE
    ENGINE -->|"answers"| CLIENT
    STUDIO -->|"schema + enrichment"| ENGINE
    style ENGINE fill:#1a0a10,stroke:#e94560,color:#e94560
    style CLIENT fill:#0a1a0d,stroke:#00ff88,color:#c0c0d0
    style STUDIO fill:#0a0d1a,stroke:#4a9eff,color:#c0c0d0

Currently serving: DCC operators (4,363), creative methodology (901), course transcripts (9,379), TRIZ principles, design thinking

enrichment pipeline

Schema-guided enrichment. Six phases.

Template-first approach: strict typed constraints → pre-filter invalid pairs → guided reasoning LLM → judge with traceability → quality gate. Without template constraints, LLM accuracy = 40%. With template + guided reasoning = 68%, target 85%+.

Deterministic

Name patterns, bridges, structural edges. No LLM needed — free and fast.

Embedding Candidates

Cosine similarity above threshold. Gemini embeddings surface potential relationships.

Pre-filter + LLM Classify

Template constraints reject invalid pairs before LLM. Guided reasoning (5-step) classifies the rest.

LLM-as-Judge

4-step validation with parsed traceability. Routes: accept, reject, or flag for human review.

Expert Spot-check

Sample review by domain expert. Calibrates LLM judge accuracy.

Quality Gate

Health score, orphan rate, modularity checks. Pass → promote to production.

Strict templates — typed allowed_pairs constraint matrix per relation type
Decision log — JSONL trace of every decision: prompt, response, routing
Schema proposals — unknown relations quarantined, not silently dropped
Pre-filter — structurally invalid pairs rejected before LLM (free)

Four search modes. One API call.

Keyword precision, semantic understanding, or both. Edge-enriched embeddings capture graph context, not just node text.

≣

Keyword Search

Case-insensitive text match across configurable fields. Weighted scoring: relevance × source weight × confidence. Fast, precise, deterministic.

❀

Semantic Search

Gemini embeddings (768d) with HNSW index. Edge-enriched — each node's embedding includes its graph neighborhood. Finds concepts even when wording doesn't match.

✦

Hybrid + Expansion

Keyword and semantic merged with score normalization. Optional LLM query expansion adds synonyms before embedding. Best of both worlds.

⇄

Graph Traversal

BFS from any node with depth and type filters. Confidence-gated — traverse only through trusted nodes. Python runtime, not database-locked.

provenance

Every node has a trust score

Not all knowledge is equal. LLM-generated content starts at low confidence. Expert-verified content earns trust. The engine tracks the journey.

graph LR
    M["MODEL
confidence: 0.3"] -->|"human review"| S["SESSION
confidence: 0.6"]
    S -->|"expert validation"| E["EXPERT
confidence: 0.9"]
    E -->|"feedback refines"| S
    style M fill:#0e0e14,stroke:#3a3a50,color:#a0a0b8
    style S fill:#0a0d1a,stroke:#4a9eff,color:#4a9eff
    style E fill:#0a1a0d,stroke:#00ff88,color:#00ff88,stroke-width:2px

Source types: expert (0.8–1.0) · docs (0.9) · session (0.6–0.8) · model (0.3–0.5). Queries filter by min_confidence. Feedback adjusts scores in real time.

real-time

Write a node. Search it instantly.

On-write embedding — every new node is automatically vectorized with edge context. No batch jobs, no waiting. The node is semantically searchable the moment it's written.

# Write a node via API
POST /api/nodes
{
  "source": "my-domain",
  "node": {"id": "insight_42", "node_type": "insight", "label": "Pattern discovered", ...}
}

→ {"status": "ok", "node_id": "insight_42", "embedded": true}

# Immediately searchable
GET /api/query?source=my-domain&semantic=pattern+discovery
→ results include insight_42

competitive landscape

Why schema-guided matters

Without schema constraints, LLM extraction produces 40–76% accuracy. Schema-guided extraction with typed constraints changes the game.

Tool	Schema	Enrichment	Quality Gates	Provenance	Runtime
Neo4j	No	No	No	No	Graph DB
GraphRAG	Optional	Extract only	No	No	Library
LightRAG	No	No	No	No	Library
KG-Gen	Optional	No	No	No	Library
h2t-graphs	Default (strict)	6-phase pipeline	Yes	Yes	API + Engine

Differentiator: Schema-guided extraction as default (not optional), with provenance, enrichment pipeline, and quality gates in one package.

integrations

Part of a self-improving ecosystem

h2t-graphs is the persistence layer in a larger feedback loop. Evaluation, optimization, and continuous learning flow through the graph.

graph LR
    subgraph CLIENTS ["CLIENTS"]
        C1["DCC Skill"] & C2["Creative"] & C3["Transcription"]
    end
    subgraph GRAPHS ["h2t-graphs"]
        G1["Store"] & G2["Search"] & G3["Enrich"] & G4["Score"]
    end
    subgraph EVALS ["h2t-evals"]
        V1["GT Packs"] & V2["Judge"] & V3["Optimizer"]
    end
    CLIENTS -->|"query + feedback"| GRAPHS
    GRAPHS -->|"metrics"| EVALS
    EVALS -->|"improve"| CLIENTS
    style GRAPHS fill:#1a0a10,stroke:#e94560,color:#e94560
    style CLIENTS fill:#0a1a0d,stroke:#00ff88,color:#c0c0d0
    style EVALS fill:#0a0d1a,stroke:#4a9eff,color:#c0c0d0

h2t-graphs

Schema-guided knowledge engine
Provenance & confidence tracking
6-phase enrichment pipeline
Semantic + keyword search

h2t-evals

Ground truth pack management
Judge calibration & scoring
Prompt optimization (DSPy)
Quality regression detection

roadmap

What's next

From template-first enrichment to a full Graph Studio with LLM copilot.

v0.8

Template-First Enrichment current

Strict templates with typed allowed_pairs constraint matrix. Pre-filter rejects invalid pairs before LLM. Guided reasoning prompts (5-step classifier, 4-step judge). JSONL decision log. Schema proposals quarantine. Multi-rubric validation with SGR gates.

v0.9

Template Library planned

Generalize TD template to 3+ domains. Template validation and versioning. Cross-domain template composition.

v1.0

Graph Studio MVP planned

LLM Schema Copilot — 5-7 questions → template stack. Multi-layer graph support. Cross-layer linking. 10 graph type families.

stack

Built for solo + AI teams

Lightweight. No Kubernetes. No managed services. One VPS, auto-TLS, full control. 601 tests. CI/CD: push main → deploy.

Python 3.11

FastAPI

PostgreSQL + pgvector

NetworkX

Gemini Embeddings

HNSW Vector Index

On-Write Embedding

Enrichment Pipeline

Caddy + Auto-TLS

Docker

Token Auth (RO/RW)

CI/CD

access

Get API Access

All /api/ endpoints require an X-H2T-Token header. Documentation and this page are public.

Python (recommended):

pip install git+https://github.com/lichtpfad/h2t-client.git

from h2t_client import GraphsClient
client = GraphsClient()  # auto-reads token from ~/.dor/secrets.env
results = client.search("td", "noise generation", semantic=True)

curl:

curl -H "X-H2T-Token: YOUR_TOKEN" \
  "https://graphs.lichtpfadstudio.com/api/query?source=creative&semantic=инверсия"

For API access, contact @prcdrl on Telegram.

Your agents need knowledge,not another database