# Donkai Runtime — Local AI Hub

**Product vision:** An Ollama-like experience for the UnyKorn/Donkai stack — download models, chat locally, persistent memory, and a unified OpenAI-compatible API that routes to Ollama, Grok, Finn, Apostle, and cloud backends. **We wrap and route; we do not rebuild Ollama.**

| Layer | Status | Notes |
|-------|--------|-------|
| Web catalog (`/runtime/`) | **LIVE** | Model list, install docs, honest ROADMAP labels |
| Local gateway (`:7720`) | **LIVE (MVP)** | `donkai_gateway.py` — `/v1/models`, `/v1/chat/completions`, `/memory` |
| Finn orchestrator (`:7700`) | **LIVE** | Cognitive spine, tools, voice, biometrics |
| Ollama (`:11434`) | **LIVE** (if installed) | GPU offload via Ollama's CUDA backend |
| Memory (SQLite) | **LIVE** | Finn `MemoryStore` + FamilyKnowledge pattern |
| Desktop shell (Electron/Tauri) | **ROADMAP** | Today: browser at `http://127.0.0.1:7720` + Finn `web/operator.html` |
| Model pull UI | **ROADMAP** | Today: `ollama pull` CLI; web pull queue planned |
| Apostle agent lane | **ROADMAP** | Chain `:7332` wired in catalog; gateway stub |
| Cloudflare edge proxy | **LIVE (catalog)** | `donk-agent` `/api/runtime/*` — manifest only (no localhost from edge) |

---

## Architecture

```mermaid
flowchart TB
  subgraph Web["donkai.org"]
    LP["/runtime/ landing"]
    DOC["docs/DONKAI_RUNTIME.md"]
    MV["Metaverse Donk Foundry"]
    API["donk-agent /api/runtime/*"]
  end

  subgraph Desktop["Local machine — Kevan RTX 5090"]
    GW["donkai_gateway :7720<br/>OpenAI-compatible /v1"]
    FINN["Finn Runtime :7700<br/>Oracle Spine + Memory"]
    OLL["Ollama :11434<br/>CUDA GPU offload"]
    BRAIN["finn-brain :7710<br/>Rust/candle local"]
    GPU["NVIDIA CUDA<br/>TF32 · TensorRT cache"]
  end

  subgraph Cloud["Cloud backends — env keys only"]
    GROK["xAI Grok API"]
    OAI["OpenAI / Codex"]
    CF["Cloudflare Workers AI"]
  end

  subgraph Chain["Sovereign rails"]
    APO["Apostle Chain :7332"]
  end

  LP --> GW
  MV --> LP
  API --> LP
  GW -->|ollama/*| OLL
  GW -->|finn/*| FINN
  GW -->|grok/*| GROK
  GW -->|memory| FINN
  FINN --> GPU
  OLL --> GPU
  FINN --> BRAIN
  FINN --> APO
  FINN --> OAI
  FINN --> CF
```

---

## Components

### 1. Web — `donkai.org/runtime/`

Public landing page (like ollama.com/library):

- Model catalog from `runtime/builder-models.json`
- Install instructions (Ollama, gateway, PM2)
- **Honest labels:** LIVE vs ROADMAP per feature
- Links to Finn operator UI (`http://127.0.0.1:7700/ui/operator.html`) and gateway health

### 2. Desktop / Local UI

**MVP (today):**

| Surface | URL | Role |
|---------|-----|------|
| Donkai Gateway | `http://127.0.0.1:7720` | OpenAI-compatible API + minimal health |
| Finn Operator | `http://127.0.0.1:7700/ui/operator.html` | Chat, voice, biometrics, tools |
| Finn Talk | `http://127.0.0.1:7700/ui/talk.html` | Voice-first console |

**ROADMAP:** Tauri/Electron shell bundling gateway + chat + model pull.

### 3. API Gateway — `donkai_gateway.py` (`:7720`)

Unified OpenAI-compatible surface. Model routing by prefix:

| Prefix | Backend | Example |
|--------|---------|---------|
| `ollama/` | `http://127.0.0.1:11434` | `ollama/llama3.2` |
| `grok/` | xAI API (`XAI_API_KEY`) | `grok/grok-3` |
| `finn/` | Finn Runtime `:7700/v1/chat` | `finn/operator` |
| `openai/` | OpenAI (`OPENAI_API_KEY`) | `openai/gpt-4o` |
| `apostle/` | **ROADMAP** — Apostle agent RPC | `apostle/donk-agent` |

**Endpoints:**

```
GET  /health
GET  /v1/models
POST /v1/chat/completions
GET  /memory          → proxy Finn /v1/memory
POST /memory/search   → proxy Finn /v1/memory/search
```

Config: `donkai-hub/runtime/builder-models.json` (also copied to `donk/apps/runtime/runtime-models.json`).

### 4. Memory Layer

Finn already implements persistent memory:

- **SQLite** — `MemoryStore` at `~/.finn/memory.db` (path from `genesis.json`)
- **Vector** — Chroma at `~/.finn/vectors`
- **FamilyKnowledge** — `FamilyContextEngine` resolves speaker → family context
- **Burnzy bus** — optional cross-agent memory at loopback

Donkai Gateway exposes read/search via Finn proxy; writes go through Finn `/v1/memory` or chat spine auto-save.

**ROADMAP:** D1 sync for federation recall; KV session cache at edge.

### 5. NVIDIA GPU

| Component | GPU use |
|-----------|---------|
| Ollama | Native CUDA — `ollama run` uses GPU when available |
| Finn Piper TTS | ONNX `CUDAExecutionProvider` via `kernel/gpu_config.py` |
| Finn face auth | InsightFace + TensorRT cache `~/.finn/trt_cache` |
| Finn PyTorch model | `FinnModelProvider` when weights present |
| finn-brain | Rust/candle on `:7710` — sovereign fallback |

**Env:** `FINN_REQUIRE_GPU=1` (strict) or `0` (CPU fallback). RTX 5090 tuning in `gpu_config.apply()`.

**Recommended local models (Ollama):**

- `llama3.2:3b` — fast operator assist
- `qwen2.5:7b` — builder lane (already used in platform-hub builds)
- `deepseek-r1:8b` — reasoning
- `nomic-embed-text` — embeddings (Finn ingestion complement)

### 6. Finn as Orchestrator

Finn Runtime (`:7700`) is the **sovereign orchestrator**, not replaced by the gateway:

```
Operator → Finn Spine → Intent → Plan → Retrieve (memory + vectors)
         → ToolBus (Oracle) → voice / biometrics / Apostle / Cloudflare / GitHub …
         → Provider (LocalBrain → FinnModel → OpenAI → Scaffold)
         → Verify → Respond
```

Donkai Gateway is the **compatibility shim** for tools expecting OpenAI (`ollama/`, `grok/`, etc.). Finn chat uses the full spine; gateway `finn/*` routes to `/v1/chat` (lighter path).

**Integrations already in Finn:**

- Apostle Chain tools (finn-apostle-integration skill)
- Grok personas (`kernel/personas/grok_personas.py`)
- OpenClaw / Hermes / Claude Code via Codex tool + operator configs
- Voice (Piper/PersonaPlex), biometrics (face/voice auth)

### 7. Cloudflare — `donk-agent`

Edge routes (catalog only — workers cannot reach localhost):

```
GET /api/runtime/catalog   — builder-models manifest + feature flags
GET /api/runtime/health    — static gateway docs + probe instructions
```

Live inference stays on the operator machine.

### 8. PM2 — `donkai-runtime`

```powershell
pm2 start C:\Users\Kevan\Finn\ecosystem.finn.config.cjs --only donkai-runtime
```

Runs `donkai_gateway.py` on port **7720**. Sits alongside `finn-genesis-runtime` (`:7700`).

---

## Ports Reference

| Port | Service |
|------|---------|
| 7720 | **Donkai Gateway** (OpenAI-compatible) |
| 7700 | Finn Runtime (orchestrator) |
| 7710 | finn-brain / status gateway |
| 11434 | Ollama |
| 7332 | Apostle Chain |
| 8998 | PersonaPlex (WSL GPU) |
| 3080 | OpenClaw Nerve |

---

## File Map

```
donkai-hub/
  runtime/
    index.html              # Public landing
    builder-models.json     # Routing catalog
  docs/
    DONKAI_RUNTIME.md       # This document

donk/apps/runtime/
  donkai_gateway.py         # Local FastAPI gateway
  runtime-models.json       # Symlink/copy of builder-models.json

unykorn-ops/
  workers/donk-agent/src/runtime.ts
  data/runtime/builder-models.json

Finn/
  ecosystem.finn.config.cjs # PM2: donkai-runtime
```

---

## Security

- **No API keys in git.** Use `.env.local`, Finn vault, or OS env.
- Gateway binds `127.0.0.1` only.
- Finn routes use `require_local_or_token` for protected endpoints.
- Cloudflare edge serves public catalog only.

---

## ROADMAP (honest)

1. Tauri desktop app with model pull UI
2. Apostle `apostle/*` chat routing
3. D1 federation memory sync
4. Auto-detect GPU VRAM → model recommendations
5. One-click `donkai-runtime` installer (Windows)
6. Hermes/OpenClaw as named gateway prefixes
