Schema-Guided Knowledge Engine

Your agents need knowledge,
not another database

Multi-source knowledge graph engine with schema-guided extraction, provenance tracking, confidence scoring, and quality gates — the engine powering structured AI knowledge.

API Documentation Get Access
22K+
Nodes
50K+
Edges
9
Sources
601
Tests
32
Endpoints
how it works

What is a knowledge graph?

Nodes are facts — concepts, operators, frameworks, cases. Edges are relationships — teaches, extends, illustrates. Confidence encodes trust. Click any node to see how traversal discovers connected knowledge.

Node color = type · brightness = confidence · click to traverse

positioning

Not a graph database. A knowledge engine.

Graph databases store and query relationships. h2t-graphs goes further — schema-guided extraction, enrichment pipeline, quality gates, and LLM-optimized output. Your agent gets answers, not rows.

Graph Database (Neo4j, ArangoDB)

  • Store nodes and edges
  • Cypher / AQL queries
  • Returns raw graph data
  • No opinion on quality or trust
  • Schema optional — no extraction
  • Agent must interpret results

h2t-graphs (Knowledge Engine)

  • Schema-guided extraction as default path
  • Provenance on every node — who said it, how reliable
  • 6-phase enrichment pipeline with quality gates
  • Semantic + keyword + hybrid search in one call
  • LLM-optimized output — scored, ranked, token-estimated
  • Built specifically for AI agent consumption
architecture

Engine owns infrastructure. Domain owns knowledge.

3-layer ownership model. The engine knows nothing about Houdini, TouchDesigner, or creative thinking. It stores, scores, queries, and validates. Consumers bring domain data and get domain answers.

graph LR
    subgraph CLIENT ["CLIENT LAYER"]
        D1["h2t-ai
DCC Operators"] & D2["creative-thinking
Methodology"] & D3["h2t-transcription
Courses"] end subgraph ENGINE ["h2t-graphs ENGINE"] E1["Store"] & E2["Score"] & E3["Search"] & E4["Validate"] & E5["Enrich"] end subgraph STUDIO ["GRAPH STUDIO (future)"] S1["Templates"] & S2["LLM Copilot"] & S3["ETL Pipeline"] end CLIENT -->|"data + feedback"| ENGINE ENGINE -->|"answers"| CLIENT STUDIO -->|"schema + enrichment"| ENGINE style ENGINE fill:#1a0a10,stroke:#e94560,color:#e94560 style CLIENT fill:#0a1a0d,stroke:#00ff88,color:#c0c0d0 style STUDIO fill:#0a0d1a,stroke:#4a9eff,color:#c0c0d0

Currently serving: DCC operators (4,363), creative methodology (901), course transcripts (9,379), TRIZ principles, design thinking

enrichment pipeline

Schema-guided enrichment. Six phases.

Template-first approach: strict typed constraints → pre-filter invalid pairs → guided reasoning LLM → judge with traceability → quality gate. Without template constraints, LLM accuracy = 40%. With template + guided reasoning = 68%, target 85%+.

01

Deterministic

Name patterns, bridges, structural edges. No LLM needed — free and fast.

02

Embedding Candidates

Cosine similarity above threshold. Gemini embeddings surface potential relationships.

03

Pre-filter + LLM Classify

Template constraints reject invalid pairs before LLM. Guided reasoning (5-step) classifies the rest.

04

LLM-as-Judge

4-step validation with parsed traceability. Routes: accept, reject, or flag for human review.

05

Expert Spot-check

Sample review by domain expert. Calibrates LLM judge accuracy.

06

Quality Gate

Health score, orphan rate, modularity checks. Pass → promote to production.

Strict templates — typed allowed_pairs constraint matrix per relation type
Decision log — JSONL trace of every decision: prompt, response, routing
Schema proposals — unknown relations quarantined, not silently dropped
Pre-filter — structurally invalid pairs rejected before LLM (free)

provenance

Every node has a trust score

Not all knowledge is equal. LLM-generated content starts at low confidence. Expert-verified content earns trust. The engine tracks the journey.

graph LR
    M["MODEL
confidence: 0.3"] -->|"human review"| S["SESSION
confidence: 0.6"] S -->|"expert validation"| E["EXPERT
confidence: 0.9"] E -->|"feedback refines"| S style M fill:#0e0e14,stroke:#3a3a50,color:#a0a0b8 style S fill:#0a0d1a,stroke:#4a9eff,color:#4a9eff style E fill:#0a1a0d,stroke:#00ff88,color:#00ff88,stroke-width:2px

Source types: expert (0.8–1.0) · docs (0.9) · session (0.6–0.8) · model (0.3–0.5). Queries filter by min_confidence. Feedback adjusts scores in real time.

real-time

Write a node. Search it instantly.

On-write embedding — every new node is automatically vectorized with edge context. No batch jobs, no waiting. The node is semantically searchable the moment it's written.

# Write a node via API
POST /api/nodes
{
  "source": "my-domain",
  "node": {"id": "insight_42", "node_type": "insight", "label": "Pattern discovered", ...}
}

→ {"status": "ok", "node_id": "insight_42", "embedded": true}

# Immediately searchable
GET /api/query?source=my-domain&semantic=pattern+discovery
→ results include insight_42
competitive landscape

Why schema-guided matters

Without schema constraints, LLM extraction produces 40–76% accuracy. Schema-guided extraction with typed constraints changes the game.

Tool Schema Enrichment Quality Gates Provenance Runtime
Neo4j No No No No Graph DB
GraphRAG Optional Extract only No No Library
LightRAG No No No No Library
KG-Gen Optional No No No Library
h2t-graphs Default (strict) 6-phase pipeline Yes Yes API + Engine

Differentiator: Schema-guided extraction as default (not optional), with provenance, enrichment pipeline, and quality gates in one package.

telemetry

By the Numbers


22K+
Graph Nodes
50K+
Graph Edges
9
Sources
32
API Endpoints
601
Tests
8
ADRs
6
Enrichment Phases
11/12
Readiness Score
integrations

Part of a self-improving ecosystem

h2t-graphs is the persistence layer in a larger feedback loop. Evaluation, optimization, and continuous learning flow through the graph.

graph LR
    subgraph CLIENTS ["CLIENTS"]
        C1["DCC Skill"] & C2["Creative"] & C3["Transcription"]
    end
    subgraph GRAPHS ["h2t-graphs"]
        G1["Store"] & G2["Search"] & G3["Enrich"] & G4["Score"]
    end
    subgraph EVALS ["h2t-evals"]
        V1["GT Packs"] & V2["Judge"] & V3["Optimizer"]
    end
    CLIENTS -->|"query + feedback"| GRAPHS
    GRAPHS -->|"metrics"| EVALS
    EVALS -->|"improve"| CLIENTS
    style GRAPHS fill:#1a0a10,stroke:#e94560,color:#e94560
    style CLIENTS fill:#0a1a0d,stroke:#00ff88,color:#c0c0d0
    style EVALS fill:#0a0d1a,stroke:#4a9eff,color:#c0c0d0

h2t-graphs

  • Schema-guided knowledge engine
  • Provenance & confidence tracking
  • 6-phase enrichment pipeline
  • Semantic + keyword search

h2t-evals

  • Ground truth pack management
  • Judge calibration & scoring
  • Prompt optimization (DSPy)
  • Quality regression detection
roadmap

What's next

From template-first enrichment to a full Graph Studio with LLM copilot.

v0.8

Template-First Enrichment current

Strict templates with typed allowed_pairs constraint matrix. Pre-filter rejects invalid pairs before LLM. Guided reasoning prompts (5-step classifier, 4-step judge). JSONL decision log. Schema proposals quarantine. Multi-rubric validation with SGR gates.

Template Library planned

Generalize TD template to 3+ domains. Template validation and versioning. Cross-domain template composition.

v1.0

Graph Studio MVP planned

LLM Schema Copilot — 5-7 questions → template stack. Multi-layer graph support. Cross-layer linking. 10 graph type families.

stack

Built for solo + AI teams

Lightweight. No Kubernetes. No managed services. One VPS, auto-TLS, full control. 601 tests. CI/CD: push main → deploy.

Python 3.11
FastAPI
PostgreSQL + pgvector
NetworkX
Gemini Embeddings
HNSW Vector Index
On-Write Embedding
Enrichment Pipeline
Caddy + Auto-TLS
Docker
Token Auth (RO/RW)
CI/CD

Get API Access

All /api/ endpoints require an X-H2T-Token header. Documentation and this page are public.

Python (recommended):

pip install git+https://github.com/lichtpfad/h2t-client.git

from h2t_client import GraphsClient
client = GraphsClient()  # auto-reads token from ~/.dor/secrets.env
results = client.search("td", "noise generation", semantic=True)

curl:

curl -H "X-H2T-Token: YOUR_TOKEN" \
  "https://graphs.lichtpfadstudio.com/api/query?source=creative&semantic=инверсия"

For API access, contact @prcdrl on Telegram.