Project Plan Internal · v2026.05.08 SPEC.md · tasks.md · RUN-SUMMARY.md

LLM Runner Rebuild.

A clean, modular Python runner — codified infrastructure, hybrid graph + vector retrieval, single-person observability via Grafana Cloud. Nine phases, one solo operator, one filesystem-as-message-queue.

Owner
Dave Cockson
Started
2026-05-08
Targets
Contabo VPS · Homelab · Davas
Doc revision
2026-05-08 · §17
§00

Status at a glance

Source · RUN-SUMMARY
Current phase
Phase 0 — Foundations
in progress
Last session
P0.3 done
AWS IAM homelab-svc verified
Blockers
None
Old runner still cohabiting
Next up
P0.4 → P0.5 → P0.6
S3 versioning · GitLab repos · clones
Up next — pick up here
  1. P0.4 — S3 bucket gap-fill; verify versioning is Enabled on davidcockson-tfstate.
  2. P0.5 — Create five GitLab repos (infra-contabo, infra-homelab, infra-shared, llm-runner, platform-docs) with baseline CI + main-branch protection.
  3. P0.6 — Clone all five locally under ~/Documents/GitHub/rebuild-v2-repos/; git status clean.
§01–02

Goals & non-negotiables

Why we're doing this · how we won't
Goals
  1. Rebuild the LLM runner as a clean, modular Python codebase Dave can extend solo.
  2. Codify all infrastructure (Contabo + homelab) in version-controlled IaC.
  3. Hybrid multi-agent system with both graph and vector retrieval — supports projects like pickles-gmbh-ai-governance-framework.
  4. Single-person observability via Grafana Cloud (logs + traces + metrics) with Discord alerts.
  5. Demonstrate industry-standard tooling for hiring conversations.
Non-negotiable rules
  • One job per file. 150 lines max per Python file.
  • Every module has a mirrored test in tests/.
  • No silent failures — errors surface explicitly.
  • No raw os.environ — Pydantic Settings v2, validated at startup.
  • No secrets in code, config, or git. Infisical → materialised .env.
  • If a file imports from >3 internal modules, split it.
  • Conventional commits: feat: fix: chore: test: docs: ci:
  • pytest only. No unittest, no inline test scripts.
  • requirements.txt is the deploy artifact — generated by uv pip compile.
§03

Tech stack

Pinned choices
LayerChoice
LanguagePython 3.12
Package manageruv → requirements.txt
Lint / format / typesruff + mypy --strict
Web frameworkFastAPI
Asyncasyncio throughout
ConfigPydantic Settings v2
LLM orchestrationLangGraph
Vector storeQdrant
Graph storeNeo4j Community
Agent KG librarygraphiti-core (Zep)
MCPFastMCP (jlowin)
Embeddingsnomic-embed-text via Ollama
LayerChoice
TracingOTel SDK → Alloy → Tempo
Logsstdlib → Alloy → Loki
MetricsPrometheus → Alloy → Mimir
AlertsGrafana Cloud → Discord
Testingpytest + pytest-asyncio + httpx
ContainersDocker Compose
SecretsInfisical Cloud (EU)
IaC — ContaboTerraform + Ansible
IaC — HomelabAnsible only
Object storageAWS S3 (us-east-1)
CI/CDGitLab CI
Public ingress · VPNCloudflare Tunnel · Tailscale
§05

Phase order

10 phases, sequential, gated
PhaseOutcomeStatus
Pre-flightReconciliation answers PF.1–PF.6 — Infisical, S3, GitLab/Gitea, runner cohabitation, Langfuse.done
P0Infisical project + machine identities. AWS + 3 S3 buckets. GitLab repos with baseline CI.in progress
P1Terraform state in S3. Contabo provisioned (firewall, Cloudflare DNS). Hosts bootstrapped via Ansible.not started
P2Observability live — Alloy on both hosts shipping logs+metrics+traces; Discord alerts wired.not started
P3Runner foundation — config, providers, Docker Compose boot.not started
P4Worker — queue poller, executor, job types, Discord notifications.not started
P5Web UI + chat at control.davidcockson.com.not started
P6MCP server, tools, research mode.not started
P7RAG (Qdrant) + KG (Neo4j + graphiti) hybrid retrieval.not started
P8Scheduled agents, usage tracking.not started
P9Cutover — vault migration, decommission old runner, snapshot to S3.not started
§07 · §10

Repository & runtime layout

Five repos · eight services
// gitlab.com/davidcockson/
├── infra-contabo     terraform + ansible
├── infra-homelab     ansible only
├── infra-shared      reusable roles
├── llm-runner        ⤵ python runner
└── platform-docs     runbooks + dashboards

// llm-runner/
├── services/
│   ├── web/        FastAPI + HTMX UI
│   ├── worker/     queue poller + executor
│   ├── mcp/        FastMCP tool server
│   └── rag/        LangGraph hybrid retrieval
├── shared/
│   ├── providers/  ollama·groq·gemini·anthropic
│   ├── config/     Pydantic Settings v2
│   ├── observability/
│   └── models/
├── agents/          briefing · research · memory
├── tests/           mirrors source tree
├── docker-compose.yml
├── pyproject.toml   uv-managed
└── requirements.txt generated, committed
:8000
web
FastAPI UI + chat
worker
Queue poller + executor
:8001
mcp
FastMCP tool server
:8002
rag
LangGraph hybrid retrieval
:6333
qdrant
Vector store
:7687
neo4j
Graph store (bolt)
cloudflared
Public tunnel
alloy
Logs / metrics / traces shipper
Ollama runs on the host at 127.0.0.1:11434, accessed via host.docker.internal. No ports exposed publicly. Volumes: obsidian-vault (bind to Syncthing path), qdrant-data, neo4j-data.
§12

Observability pipeline

One pipeline · Grafana Cloud · Discord
Runner code OTel SDK Container logs stdlib logging Host metrics node_exporter · cAdvisor Alloy contabo + homelab single shipper Grafana Cloud · Tempo traces Grafana Cloud · Loki logs Grafana Cloud · Mimir metrics Discord alerts job · disk · davas
§06

Model routing

Provider · purpose · quirks
Model Provider When to use Quirk to remember
qwen2.5:14bOllama (Contabo)Default chat + jobs · always-onlocalhost via host.docker.internal
gemma4:27bOllama (Davas)Project mode · deep workUI greys out when health-check unreachable; never silently fall back
llama-3.3-70b-versatileGroqFast iterationLog 429s clearly; 60s timeout
models/gemini-2.5-flashGeminiLong context, planningNeeds models/ prefix; streaming token count cosmetic 0
qwen/qwen3-32bGroqCodingSame Groq guards as above
claude-sonnet-4-6AnthropicPaid · sparinglymax_tokens required; default 8192
§14

Phase task lists

From tasks.md · ticked when done
Sonnet default execution
Opus subtle / cross-cutting
Haiku mechanical / scaffolding
⚠ two-strikes → escalate
PF

Pre-flight reconciliation

Spec ↔ live infra · all answered
IDQuestionResolution
PF.1Existing Infisical project nameReuse homelab-rebuild (org davidcockson); only prod env in active use throughout the rebuild.
PF.2S3 bucket inventoryThree buckets confirmed in us-east-1: davidcockson-tfstate, -vault-snapshots, -artifacts (created 2026-05-08). IAM updated.
PF.3Backup restore drillcontabo-backup-v1.tar.gz (1.5 GB) extracted clean. Runbook at platform-docs/runbooks/restore-from-s3.md.
PF.4GitLab vs GiteaGitLab.com for all rebuild repos. Gitea retained for Obsidian vault only.
PF.5Old runner cohabitationNew runner uses /root/obsidian-vault-v2 on Contabo until P9 cutover. No separate VM needed.
PF.6Langfuse decisionLangfuse stays for old runner only; torn down at P9 cutover (snapshot first).
§15 · §16

Scope guardrails

What we are not building
Out of scope · MVP
  • Kubernetes / K3s
  • MemPalace (replaced by Qdrant + Neo4j)
  • SearXNG
  • Langfuse (replaced by OTel + Tempo)
  • HuggingFace provider
  • OpenRouter
  • Voice interface, image generation, fine-tuning
  • mkdocs site (replacement TBD)
Open items to revisit
  • Documentation tool (mkdocs replacement)
  • Proxmox-Terraform later (currently Ansible-only)
  • Voice / image gen / fine-tuning post-MVP
log

Run log

Newest first · append-only
2026-05-08
P0.3 AWS IAM confirmed (homelab-svc)
Outcome: done. Changed: nothing new — confirmed homelab-svc already exists with policy scoped to all 3 buckets; keys live in Infisical at aws/AWS_ACCESS_KEY_ID. Spec name terraform was aspirational. Next: P0.4 — verify versioning on tfstate bucket.
2026-05-08
P0.2 Infisical machine identities verified
Created contabo-prod + homelab-prod with Universal Auth + Viewer on homelab-rebuild. Surfaced: account is on EU Infisical region — CLI commands need --domain https://eu.infisical.com; infisical run injects 0 secrets when secrets are in subfolders (revisit at P3.7 with --recursive).
2026-05-08
P0.1 Infisical prod secrets confirmed
All 7 required secrets verified via infisical secrets --env=prod --recursive. NEO4J_PASSWORD placeholdered until P7. Surfaced: CLAUDE_API_KEY duplicates ANTHROPIC_API_KEY — spec name wins.
2026-05-08
Spec + tasks + run-summary scaffolded
Wrote tasks.md from SPEC.md + homelab current-state. Added Pre-flight reconciliation block (PF.1–PF.6) for spec ↔ live-infra gaps. Next: answer PF.1–PF.6, then start P0.1.