Donkai Runtime — Local AI Hub

donkai.org/runtime · Order of Honest Labels

Your local AI hub — like Ollama, orchestrated by Finn.

Donkai Runtime wraps Ollama, Grok, OpenAI, and Finn behind one OpenAI-compatible API on your machine. Persistent memory, NVIDIA GPU offload, and the full Finn Oracle — without rebuilding Ollama.

Gateway MVP LIVE local :7720 Finn :7700

Local gateway health → Finn operator UI → Architecture doc

Feature status

Feature	Status	Notes
Web catalog	LIVE	This page
OpenAI-compatible gateway	LIVE	`127.0.0.1:7720/v1`
Finn memory recall	LIVE	`GET /memory` on gateway
Ollama GPU routing	LIVE	`ollama/*` prefix
Model pull UI	ROADMAP	Use `ollama pull` today
Desktop app (Tauri)	ROADMAP	Browser + Finn UI for now
Apostle chat route	ROADMAP	Finn Apostle tools work

Quick start

Install Ollama (NVIDIA GPU auto-detected)
Pull a model: ollama pull llama3.2:3b
Start Finn: pm2 start ecosystem.finn.config.cjs --only finn-genesis-runtime
Start gateway: pm2 start ecosystem.finn.config.cjs --only donkai-runtime
Chat: curl http://127.0.0.1:7720/v1/chat/completions -d '{"model":"ollama/llama3.2:3b","messages":[{"role":"user","content":"hi"}]}'

Env vars (local only, never commit): XAI_API_KEY, OPENAI_API_KEY, optional DONKAI_GATEWAY_PORT

Model catalog

Loaded from builder-models.json. Ollama tags discovered live when gateway is running.

Loading…

API routing

Prefix	Backend	Status

POST http://127.0.0.1:7720/v1/chat/completions
{ "model": "finn/operator", "messages": [{"role":"user","content":"recall last session"}] }

Finn + NVIDIA

Finn — Oracle Spine routes tools, voice (Piper CUDA), biometrics, Apostle, Cloudflare
Ollama — GPU inference for open models; gateway exposes as ollama/*
RTX 5090 — gpu_config.py enables TF32, TensorRT cache, ONNX CUDA for Piper + face auth
Memory — SQLite + Chroma via Finn; gateway proxies /memory