control.davidcockson.com — LLM job runner & chat platform

01 · Overview

What it is, and why I built it

A private control surface for running LLM jobs against models that live on hardware I own, exposed safely to the public internet, and observable end-to-end. Built deliberately to exercise the full lifecycle: infrastructure, application, UI, observability, and operations.

control.davidcockson.com is the front door to a small but complete LLM platform. The browser talks to a FastAPI service running on a Contabo VPS; submitted prompts become jobs on a filesystem-backed queue, are executed by a separate worker process, and stream responses back over Server-Sent Events. Chat history, job metadata, and per-job token usage are persisted to SQLite.

Heavy inference is offloaded to Davas, a home-lab GPU box reached over a Tailscale-secured tunnel — keeping latency low and GPU costs at zero, while public access is mediated by a Cloudflare Tunnel so nothing inbound is ever exposed to the open internet.

The platform also includes an MCP tool server (search, fetch, vault, sandboxed code), and a hybrid RAG service combining Qdrant vector search with a hand-rolled Neo4j knowledge graph. Agent Skills — small user-authored packages — can be selected per job to shape the model's behavior.

Every service is declared in docker-compose, every secret resolved through Infisical at run-time, every host bootstrapped by Ansible, and every byte of state-bearing infrastructure tracked in Terraform with S3-backed state. The whole platform emits logs, metrics, and OpenTelemetry traces to Grafana Cloud.

Constraint that shaped everything: built solo on a fixed budget — a single VPS plus existing home-lab hardware. No managed services where a self-hosted one would do. The result is a system small enough to fully understand, but production-shaped in every operational dimension.

02 · Project Flow

How a request travels through the system

A browser request crosses four trust boundaries before a token comes back: Cloudflare's edge, the VPS, a Tailscale tunnel, and finally the home-lab GPU. Each hop adds observability; none adds an open port.

Figure 1. End-to-end request flow No open inbound ports

1. Edge. The browser hits a Cloudflare-managed hostname. Cloudflare handles TLS, DDoS, and rate-limiting; cloudflared opens an outbound-only tunnel from the VPS — there is no inbound port on the host.

2. Web tier. A FastAPI process serves the React SPA and exposes the JSON + SSE API. Requests that need work submit a JSON file to _queue/; SSE streams events as the worker progresses.

3. Worker. A separate process atomically moves jobs through _queue → _active → _completed / _failed via single shutil.move calls — never copy + delete. A crash leaves a half-done job in _active/, which the worker requeues on startup.

4. Inference. The worker calls Davas over Tailscale for local models, or a cloud provider for frontier models. Either way, the token stream is forwarded straight through to the browser SSE.

03 · Interface

The terminal-inspired UI

After two UI iterations — a fast HTMX prototype, then a typed React rewrite — the interface settled on a calm phosphor-green aesthetic. JetBrains Mono throughout, three columns, keyboard-first.

Chat view with session sidebar and inspector — **Chat view** Session list (left), streaming chat (center), inspector with model, tokens, and job state (right). SSE keeps tokens flowing as the worker yields them.

Generated article view with metadata sidebar — **Job detail** Markdown-rendered output with metadata, token usage, and skill provenance. Every job is addressable, replayable, and diffable against sibling runs.

05 · Delivery

Plan vs reality — ten days, fourteen phases

The original plan was P0 → P9. What shipped includes those, plus four unplanned phase blocks added in flight when the work demanded them — polish (QoL), user-defined skills, and two UI rewrites.

Phase

Planned

Delivered

When

Infisical, machine identities, AWS+S3, GitLab repos with baseline CI.

All 6 sub-tasks done; reconciliation block bridged spec vs live infra.

May 08 · 2d

Terraform S3 backend; Contabo + home-lab bootstrapped via Ansible.

All 6 done in one push. Contabo firewall captured as Ansible role.

May 10 · 1d

Alloy on both hosts → Grafana Cloud; dashboards + Discord alerts.

Pipeline up; 4 dashboards + 4 alerts (spec called for 3).

May 11 → 14

Settings, providers, OTel tracing, docker-compose, infisical-run.

All 7 done. Default model corrected mid-flight.

May 10 → 11

Atomic queue poller, executor, handlers, Discord, OTel spans.

All 6 done. Executor refactored later to own notifications cleanly.

May 11 · 1d

FastAPI factory, /health, SSE chat, HTMX UI, Cloudflare tunnel swap.

UI live behind control.davidcockson.com — later rebuilt twice.

May 11 · 1d

FastMCP server; tools: search, fetch, vault, sandboxed code-exec.

All 6 done. Health route bug caught and fixed.

May 11 → 14

Qdrant + graphiti KG over Neo4j; LangGraph hybrid pipeline.

graphiti dropped after 3 cloud failures → hand-rolled extractor.

May 13 → 14

Token tracker + briefing / research / memory agents.

3 of 4 held — no concrete trigger; only token-tracker shipped.

May 14

Vault snapshot, decommission old runner, runbook, security audit.

Cutover done; old runner archived; security audit passed.

May 15 · 1d

+QoL

— added in flight —

18 polish tasks — Discord regression, deploy wrapper, secrets manifest, RAG context-only mode.

May 13 → 14

+Skills

— added in flight —

Agent Skills — user-authored skill packages selectable per job, live-reloaded.

May 14

+UI-1

— added in flight —

Terminal/TUI redesign — JetBrains Mono, 3-pane grid, keyboard layer.

May 14 → 15

+UI-2

— added in flight —

Full React port — Vite + TS, virtualised queue, ~3,000 LOC of old UI deleted.

May 16 → 17

Engineering lesson worth keeping: the unplanned phases all came from using the system, not designing it. Throwing away two UI iterations was cheaper than guessing right the first time — because the queue, worker, and SSE contract underneath stayed unchanged through every rewrite.

06 · Stack

Technologies, top to bottom

Deliberate choices: typed everywhere, declarative where possible, no managed-service lock-in. Everything in this list is either open-source or has a self-hosted equivalent.

Front-end

React 18
TypeScript
Vite
SSE streaming

Back-end

Python 3.12
FastAPI
FastMCP
LangGraph

Data & AI

Ollama self-hosted
Qdrant vectors
Neo4j KG
SQLite

Infrastructure

Terraform
Ansible
Docker Compose
AWS S3 tfstate

Networking

Cloudflare Tunnel
Tailscale
iptables-nft

Secrets

Infisical EU
Machine identities
Runtime resolution

Observability

Grafana Alloy
Loki · Mimir · Tempo
OpenTelemetry
Discord alerts

Workflow

GitLab CI
Syncthing
Obsidian vault
Claude Code

07 · Methodology

How it was actually built

The work was run as a tightly-scoped, spec-driven, AI-augmented engineering loop. Every change was specified, sized, executed, tested, and logged before the session closed. The numbers below are pulled directly from the repository and from the Claude Code session logs.

Tooling. Built entirely with Claude Code as the driver, running primarily against claude-sonnet-4-6 for implementation work, with claude-opus-4-7 reserved for architecture decisions, complex debugging, and any session that touched the cutover-sensitive path.

Process. Three durable artefacts drove the entire ten-day build: SPEC.md as the contract, tasks.md as the work breakdown (≈120 atomic tasks across 14 phases), and an append-only RUN-SUMMARY.md as the cross-session ledger — every session ended with a log entry covering tasks worked, outcome, what changed, what surfaced, and what the next session should pick up. This made resuming mid-phase from either machine trivial.

Quality gate. A 10-stage make pre-push pipeline ran on every push: lint, type-check (mypy strict), unit + integration tests, Docker build, container health smoke. A push only landed when all ten stages were green — typically in 100-200 seconds.

Discipline. No commit without a passing pre-push gate. No phase ticked complete without its test coverage. No deferred work without an explicit follow-up task. Roughly half of the eventual scope was discovered during execution — the unplanned QoL, Skills, and UI phases — and was absorbed through the same loop, not by changing the loop.

Claude sessions

107

across 9 working days

Assistant turns

10k+

10,031 across Sonnet + Opus

Tokens processed

727M

in + out + cache, all sessions

Run-log entries

119

one per session, append-only

Models used

Sonnet 4.6 62.5%
Opus 4.7 37.5%

Source artefacts

SPEC.md 512 lines
tasks.md 453 lines
RUN-SUMMARY.md ~210 KB

Test surface

518 tests at cutover
10-stage pre-push gate
mypy strict type-checked

Build & deploy

Docker multi-stage
Container health smoke
One-command deploy.sh
Atomic git commits

LLM job runner & chat platform control.davidcockson.com

What it is, and why I built it

How a request travels through the system

The terminal-inspired UI

Services and data flow inside the VPS

Plan vs reality — ten days, fourteen phases

Technologies, top to bottom

How it was actually built