donkai.org/runtime · Order of Honest Labels
Your local AI hub — like Ollama, orchestrated by Finn.
Donkai Runtime wraps Ollama, Grok, OpenAI, and Finn behind one OpenAI-compatible API on your machine. Persistent memory, NVIDIA GPU offload, and the full Finn Oracle — without rebuilding Ollama.
Gateway MVP LIVE
local :7720
Finn :7700
Feature status
| Feature | Status | Notes |
|---|---|---|
| Web catalog | LIVE | This page |
| OpenAI-compatible gateway | LIVE | 127.0.0.1:7720/v1 |
| Finn memory recall | LIVE | GET /memory on gateway |
| Ollama GPU routing | LIVE | ollama/* prefix |
| Model pull UI | ROADMAP | Use ollama pull today |
| Desktop app (Tauri) | ROADMAP | Browser + Finn UI for now |
| Apostle chat route | ROADMAP | Finn Apostle tools work |
Quick start
- Install Ollama (NVIDIA GPU auto-detected)
- Pull a model:
ollama pull llama3.2:3b - Start Finn:
pm2 start ecosystem.finn.config.cjs --only finn-genesis-runtime - Start gateway:
pm2 start ecosystem.finn.config.cjs --only donkai-runtime - Chat:
curl http://127.0.0.1:7720/v1/chat/completions -d '{"model":"ollama/llama3.2:3b","messages":[{"role":"user","content":"hi"}]}'
Model catalog
Loaded from builder-models.json. Ollama tags discovered live when gateway is running.
Loading…
API routing
| Prefix | Backend | Status |
|---|
POST http://127.0.0.1:7720/v1/chat/completions
{ "model": "finn/operator", "messages": [{"role":"user","content":"recall last session"}] }
Finn + NVIDIA
- Finn — Oracle Spine routes tools, voice (Piper CUDA), biometrics, Apostle, Cloudflare
- Ollama — GPU inference for open models; gateway exposes as
ollama/* - RTX 5090 —
gpu_config.pyenables TF32, TensorRT cache, ONNX CUDA for Piper + face auth - Memory — SQLite + Chroma via Finn; gateway proxies
/memory