Project Plan Internal · v2026.05.14 SPEC.md · tasks.md · RUN-SUMMARY.md

LLM Runner Rebuild.

A clean, modular Python runner — codified infrastructure, hybrid graph + vector retrieval, single-person observability via Grafana Cloud. Nine phases, one solo operator, one filesystem-as-message-queue.

Owner
Dave Cockson
Started
2026-05-08
Targets
Contabo VPS · Homelab · Davas
Doc revision
2026-05-14 · §17
§00

Status at a glance

Source · RUN-SUMMARY
Current phase
Phase QoL — polish pass
13 of 17 shipped
Last session
Batch D — P-QoL.4 + P-QoL.2
2026-05-14 · registry unification + Discord success embeds
Blockers
None
16 pre-existing chat/health test failures (async mock fixup pending)
Next up
P-QoL.5 → P8.1
qdrant version drift · then briefing agent
Up next — pick up here
  1. P-QoL.5 — qdrant client/server version drift (client 1.17 vs server 1.14). Pick downgrade or upgrade; either path could break RAG retrieval, needs smoke-test guard rail. Recommended next, but only if symptomatic.
  2. Pre-existing test failures — 16 chat/health test failures (sync MagicMock on async probe functions). Small dedicated MR; would clear CI noise that's surfaced on three branches now.
  3. P-QoL.17 — Davas end-to-end verification (secret flipped to Tailscale in Batch C; this is the verification pass with worker logs tailed).
  4. Aesthetics polish pass — UI typography/spacing/cards on services/web/static/{index.html,style.css}. Deferred from earlier per user direction.
  5. P8.1agents/briefing.py. Unblocked (Phase 7 closed); pick up once Phase QoL is in a place you're happy with.
§01–02

Goals & non-negotiables

Why we're doing this · how we won't
Goals
  1. Rebuild the LLM runner as a clean, modular Python codebase Dave can extend solo.
  2. Codify all infrastructure (Contabo + homelab) in version-controlled IaC.
  3. Hybrid multi-agent system with both graph and vector retrieval — supports projects like pickles-gmbh-ai-governance-framework.
  4. Single-person observability via Grafana Cloud (logs + traces + metrics) with Discord alerts.
  5. Demonstrate industry-standard tooling for hiring conversations.
Non-negotiable rules
  • One job per file. 150 lines max per Python file.
  • Every module has a mirrored test in tests/.
  • No silent failures — errors surface explicitly.
  • No raw os.environ — Pydantic Settings v2, validated at startup.
  • No secrets in code, config, or git. Infisical → materialised .env.
  • If a file imports from >3 internal modules, split it.
  • Conventional commits: feat: fix: chore: test: docs: ci:
  • pytest only. No unittest, no inline test scripts.
  • requirements.txt is the deploy artifact — generated by uv pip compile.
§03

Tech stack

Pinned choices
LayerChoice
LanguagePython 3.12
Package manageruv → requirements.txt
Lint / format / typesruff + mypy --strict
Web frameworkFastAPI
Asyncasyncio throughout
ConfigPydantic Settings v2
LLM orchestrationLangGraph
Vector storeQdrant
Graph storeNeo4j Community
Agent KG librarygraphiti-core (Zep)
MCPFastMCP (jlowin)
Embeddingsnomic-embed-text via Ollama
LayerChoice
TracingOTel SDK → Alloy → Tempo
Logsstdlib → Alloy → Loki
MetricsPrometheus → Alloy → Mimir
AlertsGrafana Cloud → Discord
Testingpytest + pytest-asyncio + httpx
ContainersDocker Compose
SecretsInfisical Cloud (EU)
IaC — ContaboTerraform + Ansible
IaC — HomelabAnsible only
Object storageAWS S3 (us-east-1)
CI/CDGitLab CI
Public ingress · VPNCloudflare Tunnel · Tailscale
§05

Phase order

10 phases, sequential, gated
PhaseOutcomeStatus
Pre-flightReconciliation answers PF.1–PF.6 — Infisical, S3, GitLab/Gitea, runner cohabitation, Langfuse.done
P0Infisical project + machine identities. AWS + 3 S3 buckets. GitLab repos with baseline CI.done
P1Terraform state in S3. Contabo provisioned (firewall, Cloudflare DNS). Hosts bootstrapped via Ansible.done
P2Observability live — Alloy on both hosts shipping logs+metrics+traces; Discord alerts wired.done
P3Runner foundation — config, providers, Docker Compose boot.done
P4Worker — queue poller, executor, job types, Discord notifications.done
P5Web UI + chat at control.davidcockson.com.done
P6MCP server, tools, research mode.done
P7RAG (Qdrant) + KG (Neo4j + graphiti) hybrid retrieval.done
P8Scheduled agents, usage tracking.not started
P9Cutover — vault migration, decommission old runner, snapshot to S3.not started
QoLQuality-of-life polish pass — 13 of 17 shipped 2026-05-14 (UI output link, Discord success, deploy.sh, registry dedup, Gemini comment, MCP /health, Davas Tailscale, CF /health bypass, smoke runbook, RAG context-only, Neo4j auto-bootstrap, deploy wrapper, chat sessions, exec bit). Open: qdrant drift, RAG logger, Davas e2e, secrets manifest.in progress
§07 · §10

Repository & runtime layout

Five repos · eight services
// gitlab.com/davidcockson/
├── infra-contabo     terraform + ansible
├── infra-homelab     ansible only
├── infra-shared      reusable roles
├── llm-runner        ⤵ python runner
└── platform-docs     runbooks + dashboards

// llm-runner/
├── services/
│   ├── web/        FastAPI + HTMX UI
│   ├── worker/     queue poller + executor
│   ├── mcp/        FastMCP tool server
│   └── rag/        LangGraph hybrid retrieval
├── shared/
│   ├── providers/  ollama·groq·gemini·anthropic
│   ├── config/     Pydantic Settings v2
│   ├── observability/
│   └── models/
├── agents/          briefing · research · memory
├── tests/           mirrors source tree
├── docker-compose.yml
├── pyproject.toml   uv-managed
└── requirements.txt generated, committed
:8000
web
FastAPI UI + chat
worker
Queue poller + executor
:8001
mcp
FastMCP tool server
:8002
rag
LangGraph hybrid retrieval
:6333
qdrant
Vector store
:7687
neo4j
Graph store (bolt)
cloudflared
Public tunnel
alloy
Logs / metrics / traces shipper
Ollama runs on the host at 127.0.0.1:11434, accessed via host.docker.internal. No ports exposed publicly. Volumes: obsidian-vault (bind to Syncthing path), qdrant-data, neo4j-data.
§12

Observability pipeline

One pipeline · Grafana Cloud · Discord
Runner code OTel SDK Container logs stdlib logging Host metrics node_exporter · cAdvisor Alloy contabo + homelab single shipper Grafana Cloud · Tempo traces Grafana Cloud · Loki logs Grafana Cloud · Mimir metrics Discord alerts job · disk · davas
§06

Model routing

Provider · purpose · quirks
Model Provider When to use Quirk to remember
qwen2.5:14bOllama (Contabo)Default chat + jobs · always-onlocalhost via host.docker.internal
gemma4:27bOllama (Davas)Project mode · deep workUI greys out when health-check unreachable; never silently fall back
llama-3.3-70b-versatileGroqFast iterationLog 429s clearly; 60s timeout
models/gemini-2.5-flashGeminiLong context, planningNeeds models/ prefix; streaming token count cosmetic 0
qwen/qwen3-32bGroqCodingSame Groq guards as above
claude-sonnet-4-6AnthropicPaid · sparinglymax_tokens required; default 8192
§14

Phase task lists

From tasks.md · ticked when done
Sonnet default execution
Opus subtle / cross-cutting
Haiku mechanical / scaffolding
⚠ two-strikes → escalate
PF

Pre-flight reconciliation

Spec ↔ live infra · all answered
IDQuestionResolution
PF.1Existing Infisical project nameReuse homelab-rebuild (org davidcockson); only prod env in active use throughout the rebuild.
PF.2S3 bucket inventoryThree buckets confirmed in us-east-1: davidcockson-tfstate, -vault-snapshots, -artifacts (created 2026-05-08). IAM updated.
PF.3Backup restore drillcontabo-backup-v1.tar.gz (1.5 GB) extracted clean. Runbook at platform-docs/runbooks/restore-from-s3.md.
PF.4GitLab vs GiteaGitLab.com for all rebuild repos. Gitea retained for Obsidian vault only.
PF.5Old runner cohabitationNew runner uses /root/obsidian-vault-v2 on Contabo until P9 cutover. No separate VM needed.
PF.6Langfuse decisionLangfuse stays for old runner only; torn down at P9 cutover (snapshot first).
§15 · §16

Scope guardrails

What we are not building
Out of scope · MVP
  • Kubernetes / K3s
  • MemPalace (replaced by Qdrant + Neo4j)
  • SearXNG
  • Langfuse (replaced by OTel + Tempo)
  • HuggingFace provider
  • OpenRouter
  • Voice interface, image generation, fine-tuning
  • mkdocs site (replacement TBD)
Open items to revisit
  • Documentation tool (mkdocs replacement)
  • Proxmox-Terraform later (currently Ansible-only)
  • Voice / image gen / fine-tuning post-MVP
log

Run log

Newest first · append-only
2026-05-14
Batch D — registry unification + Discord success notifications (Opus)
P-QoL.4: router.available_models(settings) exposed in shared/providers/router.py — returns _MODEL_PROVIDER entries filtered by configured credentials. services/web/routes/ui.py:list_models now calls it instead of holding a hardcoded list. Smoke-verified on Contabo: gemma4:27b (Davas-routed, not in local /api/tags) now appears on /models alongside dynamic Ollama entries. Bundled fix: extra="ignore" on SettingsConfigDict — Compose-only .env vars (CLOUDFLARED_TUNNEL_TOKEN, GRAFANA_TEMPO_*) no longer crash Settings(). Unblocked 16 router-test errors. P-QoL.2: JobOutcome(text, tokens, model) record widens handler return type. Executor gains _notify_success mirroring _notify_failure, a time.monotonic() timing wrapper, and _build_output_url (returns None when PUBLIC_BASE_URL unset). Chain handler's in-handler discord.notify deleted. docker-compose.yml worker env forwards PUBLIC_BASE_URL. test_discord_not_called_on_success inverted; +8 success-notify tests; -2 chain-discord tests. 55 worker tests pass. Surfaced: Design docs reference SDK shapes that drift (P-QoL.2 design assumed result.usage.total_tokens; reality has flat input_tokens/output_tokens ints on CompletionResult). Adding a new setting takes 4 edits, not 3: Pydantic field + Infisical secret + deploy.sh path + Compose environment: forwarding. Patching time.monotonic with iter([t0,t1]) exhausts mid-test because asyncio also calls it.
2026-05-14
Documentation pass — designs, runbook, audits, SPEC reconciliation (Opus, parallel to P7.7)
Phase QoL formalised: six tasks P-QoL.1P-QoL.6 added to tasks.md. P-QoL.1 (UI link to completed outputs) is new; P-QoL.2–.6 lift previously-parked items into trackable form. Designs written: platform-docs/design/p-qol-1-ui-output-link.md (URL param job_id, markdown-it-py renderer with html=False, right-pane swap pattern, 7 tests). platform-docs/design/p-qol-2-discord-success-notifications.md (widen handler return to JobOutcome, executor measures duration externally, delete chain.py in-handler notify, optional PUBLIC_BASE_URL, 9 tests). Runbook written: platform-docs/runbooks/smoke-test.md — 10-step round-trip (tunnel, models, Davas, text job, RAG, KG client, MCP) + connectivity sweep + triage cheatsheet + exit-criteria checklist. KG client construct is the explicit P7.7 exit gate. SPEC drift reconciliation: 7 fixes to SPEC.md — bucket names davidcockson-* (us-east-1), default model gemma4:e2b, layout additions (routes/ui.py, shared/notifications/), reindexer service, env vars (CLOUDFLARED_TUNNEL_TOKEN, PUBLIC_BASE_URL), CI deploy marked deferred. Audits: tasks.md staleness pass (pruned ticked-but-verbose entries on P3.6, P4.4, P5.7); memory freshness pass; new memory project_monitoring_deferred.md.
2026-05-13
P7.3–P7.6 RAG + KG complete · UI model-name drift fixed (Sonnet/Opus)
P7.3: Neo4j + graphiti config wired; auth pulled from Infisical. P7.4: kg_indexer.py + kg_retriever.py — graphiti ingest, Cypher neighbourhood query. P7.5 (Opus): services/rag/graph.py LangGraph pipeline — decompose → parallel(vector, kg) → rerank → synthesise. P7.6: nightly re-index cron (02:00 UTC) in reindexer container. UI fix: /models endpoint was returning vendor-prefixed names (groq/…) that didn't match the router table — fixed in services/web/routes/ui.py (commit e117dba). Root cause: registry duplication between UI and router — tracked as P-QoL.4 for refactor. Surfaced: Discord success-notifications regression vs old runner — non-chain completions are silent. Tracked as P-QoL.2 with full design.
2026-05-12
P7.2 vector indexer + retriever (Sonnet)
Done: vector_indexer.py chunks vault .md files, batch-embeds via Ollama nomic-embed-text, upserts to Qdrant with deterministic SHA-256 point IDs. vector_retriever.py uses qdrant.query_points() (qdrant-client 1.18 API — search() removed). 30 tests, ruff + mypy --strict clean. MR merged. Surfaced: qdrant-client 1.18 replaced QdrantClient.search() with query_points() returning QueryResponse.points.
2026-05-12
P7.1 Qdrant collections init + P6.6 research mode wired (Sonnet)
P7.1: idempotent init_collections.py creates vault, web, memory (768-dim cosine). Uses os.environ directly — avoids VAULT_PATH requirement in init containers. MR!21 merged. P6.6: shared/mcp_client.py MCPToolClient wraps FastMCP StreamableHttpTransport. Research jobs call web_search() via MCP, fall back gracefully when MCP unavailable. Pre-integration audit found vault mount :ro in MCP container silently breaking vault_write since P6.4 — fixed here.
2026-05-12
P6.5 sandboxed exec + P6.4 vault tools + P6.3 fetch + P6.2 search (Sonnet/Opus)
P6.5 (Opus): run_python tool — rlimits (CPU, AS, FSIZE, NOFILE) + scrubbed env + start_new_session=True + optional bubblewrap when bwrap present. 11 tests. P6.4: vault_read/write/list — path traversal blocked via Path.resolve().is_relative_to(). P6.3: fetch_page — trafilatura sync batched into asyncio.to_thread. 50k char cap. P6.2: web_search — ddgs primary (sync wrapped), Tavily fallback via httpx. Surfaced: from server import mcp created a separate module instance from services.mcp.server — tools registered on different objects. Fixed with full dotted import path.
2026-05-11
P5.7 runner end-to-end green + P6.1 FastMCP server (Sonnet)
P5.7: control.davidcockson.com live. Worker Dockerfile fixed (copy services/+shared/, PYTHONPATH=/app). Secrets: no secrets in .env — all via infisical run --recursive. cloudflared shows unhealthy in docker compose ps but tunnel IS connected (4 edge connections) — healthcheck timing issue only. P6.1: FastMCP v3 server boots, ping tool, OTel lifespan. 11 tests. Surfaced: Secrets leaked into chat via grep before cleaning .env — rotate Anthropic, Google AI, Groq, Discord keys. Old uvicorn killed; new runner owns hostname permanently.
2026-05-11
P5.1–P5.6 Web + Chat complete · P4.1–P4.6 Worker complete · P2.2–P2.4 Alloy rolled (Sonnet)
P5: FastAPI factory + lifespan, health probes (Ollama/Davas/Qdrant/Neo4j), SSE streaming, queue submit/list, HTMX UI (chat + queue + model picker + Davas dot), SQLite chat history. P4: Atomic poller with 2s mtime guard + crash recovery, executor with traceback-in-frontmatter failures, text/chain/research job types, Discord notifications, OTel spans on every provider call. P2: Full Grafana Cloud Alloy pipeline (Loki/Mimir/Tempo) deployed to Contabo and homelab. Loki + Mimir shipping; OTLP port remapped to 14317 (Langfuse holds 4317 until P9).
2026-05-10
P3.1–P3.7 Runner foundation + P1.1–P1.6 Infrastructure + P0.1–P0.6 Foundations (Sonnet/Opus)
P3: pyproject.toml + uv, Pydantic Settings v2, provider clients (Ollama/Groq/Gemini/Anthropic), router with Davas health-check rule, OTel SDK → Tempo, docker-compose (7/8 healthy locally), infisical deploy wrapper. P1: Terraform S3 backend + Cloudflare DNS, Ansible firewall capture (Contabo iptables snapshot), infra-shared roles (docker/tailscale/infisical-agent/alloy), contabo + homelab playbooks (idempotent, --check green). P0: Infisical secrets confirmed, machine identities verified, AWS IAM confirmed, S3 versioning confirmed, 5 GitLab repos created + CI green, local clones done.