SPEC v1 — and what we actually shipped

Original plan (2026-05-08) had 10 single-track phases ending at cutover. Final delivery: 10 phases + 4 unplanned ones, three architectural pivots, one full UI rewrite. Read left-to-right: planned ─→ delivered.

  PLANNED                                                                     ACTUAL
  ───────                                                                     ──────
  P0  foundations         ────────────────────────────────────────────────────►  P0  done   2d
  P1  infra (TF+ansible)  ────────────────────────────────────────────────────►  P1  done   1d
  P2  observability       ──────────────────────────────────[deferred alerts/dash]─►  P2  done   1d (+P2.5/2.6 6d later)
  P3  runner foundation   ────────────────────────────────────────────────────►  P3  done   2d
  P4  worker              ────────────────────────────────────────────────────►  P4  done   1d (P-QoL.2 follow-up)
  P5  web + chat (HTMX)   ────────────────────────────────────────────────────►  P5  done   1d → later thrown away
  P6  MCP                 ────────────────────────────────────────────────────►  P6  done   3d
  P7  RAG + KG (graphiti) ──[graphiti DROPPED 2026-05-14]─────────────────►  P7  done   hand-rolled KG
  P8  agents (briefing/   ──[3 of 4 HELD indefinitely]────────────────────────►  P8  done   only token-tracker
       research/memory)
  P9  cutover             ────────────────────────────────────────────────────►  P9  done   1d
  ·· · · · · · · ·  end of original plan  · · · · · · · · · ··
                                                                              + QoL    18 tasks  ─►  done
                                                                              + Skills  6 tasks  ─►  done
                                                                              + UI-1    9 tasks  ─►  done  (terminal redesign)
                                                                              + UI-2   17 tasks  ─►  done  (HTMX → React/TS rewrite)

phases

10→14

+ QoL · Skills · UI · UI-2

tracked tasks

~60→110+

~50 unplanned QoL/Skills/UI items

elapsed

—→10 d

no original estimate given

UI rewrites

0→2×

PicoCSS → TUI → React

tech swaps mid-flight

0→3

graphiti · agents · UI stack

delivered as planned scope changed / deferred dropped or replaced added (unplanned) plan reference

// phase-by-phase SPEC §5 phase order ─→ what shipped per phase

phase

planned (SPEC §5 / §14)

→

delivered

when

Infisical project, machine identities, AWS+S3 buckets, GitLab repos with baseline CI.

→

All 6 sub-tasks done. PF.1–PF.6 reconciliation block added up-front to bridge spec assumptions vs live infra (Infisical already existed, S3 buckets already named).

2026-05-08 · 2d

Terraform S3 backend; Contabo+homelab bootstrapped via Ansible (Docker, Tailscale, Alloy, Infisical agent).

→

All 6 sub-tasks done in one push. Contabo firewall captured as Ansible role (no Contabo TF firewall resource exists), iptables-nft chains preserved verbatim.

2026-05-10 · 1d

Alloy on both hosts → Grafana Cloud (logs/metrics/traces). Dashboards committed. Discord alerts wired.

→

Pipeline up 2026-05-11. P2.5 alerts & P2.6 dashboards deferred per "no alerts before stack is live" policy; finally landed 2026-05-14 via Grafana Cloud LLM-assist (4 alerts, 4 dashboards — spec called for 3).

05-11→14

Settings, providers, OTel tracing, docker-compose boot, infisical-run wrapper.

→

All 7 sub-tasks done. Default model corrected ~~gemma4:2b~~ → gemma4:e2b (the real Ollama tag); discovered during execution.

05-10→11 · 2d

Atomic queue poller, executor, text/chain/research handlers, Discord on success/fail, OTel spans.

→

All 6 done. Spec said "Discord final-step only" — turned out to be a UX regression vs old runner. Fixed later via P-QoL.2: widened return type to JobOutcome(text, tokens, model), executor owns notifications, chain's inline notify deleted.

2026-05-11 · 1d

FastAPI factory, /health, SSE chat, queue submit/list, HTMX UI, SQLite chat history, Cloudflare tunnel swap.

→

All 7 done; HTMX UI live behind control.davidcockson.com. But this UI was thrown away twice — first by Phase UI (Terminal/TUI redesign), then by Phase UI-2 (React port). Net: ~3,200 lines of UI code deleted in P-UI2.9.

2026-05-11 · 1d

FastMCP server, tools: search (ddgs+Tavily), fetch (trafilatura), vault, sandboxed code-exec. Research mode wired into chat.

→

All 6 done. /health route was missing — `curl -s` was masking 404 as healthy because curl exits 0 on 404. Fixed in P-QoL.7 with FastMCP `custom_route` decorator + `curl -fsS`.

05-11→14 · 3d

Qdrant collections; ~~graphiti-core~~ KG over Neo4j; vector_indexer+kg_indexer; LangGraph hybrid pipeline; nightly re-index cron.

→

graphiti DROPPED 2026-05-14 after 3 failed cloud-provider attempts (Groq llama 400 on json_schema; Groq gpt-oss 413 TPM; Gemini OpenAI-compat 404 on /responses). Root cause: graphiti requires OpenAI Responses API + structured outputs — no viable free tier. Replaced with hand-rolled Ollama extractor (~150 LoC, (:Entity)-[:RELATES_TO {fact}]→(:Entity) + fulltext index).

05-13→14

P7.8

(implicit) worker's research handler calls the local RAG service.

→

Surfaced mid-Phase: the RAG service had no upstream caller. Wiring it up exposed a 240s latency bottleneck on Contabo CPU — worker discards RAG's answer field by design (uses its own cloud model). Fixed by P-QoL.11: added mode=context-only to skip RAG's two LLM nodes — round-trip ~120s → <1s.

2026-05-14

4 tasks: per-provider token tracker + briefing / research / memory agents.

→

3 of 4 HELD indefinitely 2026-05-14 — "none of the three has a concrete trigger right now; building on spec alone risks rework." Only P8.4 token tracker shipped.

2026-05-14

Vault snapshot to S3, decommission old Contabo runner, runbook, security audit.

→

P9.1 snapshot skipped (vault already covered by Gitea + Syncthing + prior nightly S3). Old runner stopped+disabled; Langfuse torn down; monitoring stack archived; §7 security audit pass; obsidian-llm-runner repo archived.

2026-05-15 · 1d

+QoL

— not in spec —

→

18 polish tasks filed during execution: UI link to outputs, Discord success regression fix, deploy wrapper, registry dedup, qdrant version drift, Gemini prefix comment, MCP health, Davas-via-Tailscale, CF Access bypass, chat sessions, required-secrets manifest, RAG context-only mode, Neo4j auto-bootstrap, single canonical deploy command, executable bit, logger config, Davas e2e verification.

05-13→14

+Skills

— not in spec —

→

Agent Skills (§17, added 2026-05-14): user-authored skill packages selectable per job. 6 tasks: loader, /skills endpoint + Job.skill_id, worker prompt injection + pin-model, UI picker, Python script path via P6.5 sandbox, live reload + echo-shout/hash-calc examples. Explicit-selection, not LLM-routed.

2026-05-14

+UI-1

— not in spec —

→

Terminal/TUI redesign (§18, added 2026-05-14): phosphor green, JetBrains Mono, scanlines, 3-pane grid. 9 tasks (tokens, top-bar+footer, queue, detail, inspector, model picker, keyboard layer, empty/loading/error, live progress %/ETA/tok-s). HTMX preserved. Done 2026-05-14. + 9 hotfixes during live use 2026-05-15.

05-14→15

+UI-2

— not in spec —

→

Full React port (decision 2026-05-16): Vite + React 18 + TypeScript. 17 tasks (P-UI2.0–P-UI2.16): backend JSON endpoints, shell+tokens+chrome, virtualised queue, single/compare/stacked/tournament views, inspector, ModelPicker+DispatchOverlay+Cheatsheet, ChatView re-port, Dockerfile multi-stage, old-UI delete pass (≥3,000 LOC deleted), prompt_hash+siblings, daily_stats SQLite, SSE reconnect with seq, responsive collapse. Done 2026-05-17.

05-16→17

// build order spec order was strictly P0→P9. Actual order: bursty.

05-08 · thu

scaffold spec+tasks
PF.1–PF.6 reconcile
P0.1 secrets
P0.2 identities
P0.3 AWS IAM

5 items · planning

05-10 · sat

P0.4 → P0.6
P1.1 → P1.6 (all infra)
P2.1 grafana bootstrap
P3.1 → P3.5
P3.6 compose (partial)

~14 items · huge sprint

05-11 · sun

P2.2 → P2.4 alloy
P3.6, P3.7 deploy
P4.1 → P4.6 worker
P5.1 → P5.7 web
P6.1 mcp boot

~17 items · second sprint

05-13 · tue

UI model-drift fix
P7.5 LangGraph hybrid
P7.6 re-index cron
graphiti starts wobbling

~4 items · RAG arc

05-14 · wed

graphiti dropped
P7.7 hand-rolled KG
P7.8 worker↔RAG
P-QoL bundle A/B/C/D
P2.5/P2.6 alerts+dash
+Skills phase filed
+UI phase filed
P8 trimmed to P8.4

~12 items · pivot day

05-15 · thu

P9.1 → P9.8 cutover
Langfuse torn down
monitoring decom
security audit §7
UI-1 hotfix bundle

~10 items · cutover

05-16 · fri

React port begins
P-UI2.0 vite scaffold
P-UI2.1 backend JSON
P-UI2.2 → P-UI2.9
old HTMX UI deleted

~10 items · rewrite

05-17 · sat

P-UI2.10 siblings
P-UI2.11 compare
P-UI2.12 stacked
P-UI2.13 tournament
P-UI2.14 daily_stats
P-UI2.15 SSE reconnect
P-UI2.16 responsive
+HF.1 hotfix · shipped

~8 items · UI-2 finish

// sundays + tuesdays were the heaviest. days off (05-09, 05-12) absent from the log.

// notable pivots where the plan said one thing and reality said another

DROPPED

graphiti-core for the KG layer

P7 · 2026-05-14

spec~~graphiti-core (Zep) on top of Neo4j — agent KG library, temporal valid-from/valid-to.~~

shippedHand-rolled Ollama-driven extractor, ~150 LoC. Schema: (:Entity {name})-[:RELATES_TO {fact}]→(:Entity) + Lucene fulltext index for retrieval. No LLM at retrieval time.

Three failed cloud-provider attempts in a row: Groq llama 400 on json_schema; Groq gpt-oss 413 on TPM; Gemini OpenAI-compat 404 on /responses. Root cause: graphiti requires the OpenAI Responses API + structured outputs — no viable free tier.

HELD

Phase 8 agents (3 of 4)

P8 · 2026-05-14

spec~~briefing.py (daily cron), research.py (scheduled topic research), memory.py (fact extraction → graphiti)~~

shippedOnly P8.4 per-provider daily token tracker on /health. Other three kept in §14 under a "Held" sub-block.

Per user direction: "none of the three has a concrete trigger right now; building them on spec alone risks rework once the actual use-case shows up." Promotion back is gated on a real use-case appearing.

REWRITTEN ×2

HTMX UI → Terminal/TUI → React

P5 → +UI-1 → +UI-2 · 05-11 → 05-17

v1PicoCSS + HTMX (Phase 5) — shipped 2026-05-11
v2Terminal/TUI skin (Phase UI-1) — 9 tasks, design handoff at design_handoff_runner_tui/. Still HTMX. Done 2026-05-14.
v3React 18 + Vite + TypeScript (Phase UI-2) — 17 tasks. ≥3,000 LoC of HTMX deleted in P-UI2.9. Done 2026-05-17.

Spec §16 explicitly parked the frontend-stack decision as an open item to revisit when concrete pain showed up. Pain showed up: append-only token streaming, multi-pane synchronised re-renders. React port subsumed P-UI.10 (stack decision) and P-UI.21 (stream token deltas).

DEFERRED

CI build + deploy automation

SPEC §13 · still deferred

spec~~test → build all images → push to GitLab registry → ssh Contabo → docker compose up~~

shippedTest stage only. Build + deploy is manual ./deploy.sh from laptop after merge to main (P-QoL.13 collapsed it from a 6-flag incantation to one command).

"Post-MVP; track as a future task when cutover (P9) is complete and the manual flow's friction outweighs the CI work." That trigger has not fired.

REROUTED

Davas: Cloudflare Tunnel → Tailscale

P-QoL.8 · 2026-05-14

spec~~DAVAS_BASE_URL=https://davas-ollama.davidcockson.com~~

shippedDAVAS_BASE_URL=http://100.84.112.38:11434 (Tailscale CGNAT). The CF tunnel value was returning 530 — never worked end-to-end.

Surfaced P-QoL.18: required-secrets manifest + fail-loud on default. Infisical silently does nothing when an expected secret is absent — the worker had been falling through to a broken default for weeks.

REFRAMED

Discord notifications: "fail only" → success too

P4.4 → P-QoL.2 · 2026-05-13 → 14

spec~~"Final-step only for chains; immediate for failures."~~

shippedPer-completion embed with tokens, duration, model, and a View link if PUBLIC_BASE_URL is set. Widened handler return type to JobOutcome(text, tokens, model); executor owns the post.

Spec-compliant but a UX regression vs the old runner's complete | model | tokens | ms post. User expectation gap caught during live use.

// tech stack diff SPEC §3 table vs what's actually running on Contabo

layer

planned

→

shipped

Frontend

HTMX + PicoCSS (SPEC §10)

→

React 18 + Vite + TypeScript + Zustand + React Query + @tanstack/react-virtual

KG library

graphiti-core (Zep)

→

Hand-rolled Ollama extractor (~150 LoC)

KG schema

Temporal valid-from/valid-to facts

→

(:Entity)-[:RELATES_TO {fact}]→(:Entity) + fulltext index

Default model

gemma4:2b (spec changelog 2026-05-10)

→

gemma4:e2b (real Ollama tag)

Davas route

Cloudflare Tunnel

→

Tailscale 100.84.112.38

Deploy

GitLab CI build + push + ssh deploy

→

./deploy.sh from laptop (test stage only in CI)

S3 bucket names

dc-tfstate · dc-vault-snapshots · dc-rebuild-artifacts

→

davidcockson-tfstate · davidcockson-vault-snapshots · davidcockson-artifacts (us-east-1)

IAM user

terraform

→

homelab-svc (pre-existed)

Docs site

mkdocs

→

— replacement TBD (open item) —

CI deploy

test → build → push → ssh + docker compose

→

test only · build+deploy manual (deferred)

Held agents

briefing.py · research.py · memory.py

→

— parked indefinitely —

// unplanned phases — what got added ~50 tasks no-one wrote down on 2026-05-08

Phase QoL 18 tasks · 2026-05-13 → 14

Polish bundle filed when smoke-testing the running stack surfaced 18 small but real frictions. Originally a single audit; grew as use revealed it.

UI link to completed outputs · Discord success regression · deploy wrapper rebuild flag

Router/UI model dedup · qdrant version drift · Gemini prefix doc

MCP /health + tighter healthcheck · Davas → Tailscale · CF Access bypass for /health

Smoke-test runbook fixes · RAG context-only mode · auto-bootstrap Neo4j schema

Single canonical deploy command · chat sessions (new/switch/delete)

deploy.sh executable bit · RAG logger config · Davas e2e verification · required-secrets manifest

Phase Skills 6 tasks · 2026-05-14

SPEC §17 added mid-flight: user-authored skill packages selectable per job, as a testing harness for Dave's own skills — not an LLM-routed autonomous capability layer. Explicit selection in the UI.

Skill loader (scan SKILLS_DIR, parse SKILL.md frontmatter+body, reject duplicates)

/skills endpoint + Job.skill_id field

Worker injects skill instructions; pin-model override; logged

UI skill picker next to model picker

Python script path (scripts/main.py) via the P6.5 sandbox

Live reload (/skills/reload) + echo-shout / hash-calc bundled examples

Phase UI-1 (terminal redesign) 9 tasks + 9 hotfixes · 2026-05-14 → 15

SPEC §18: phosphor-green 3-pane terminal aesthetic replacing the PicoCSS HTMX UI. Fidelity-first port of the design_handoff_runner_tui/ bundle. HTMX preserved; one backend payload tweak (SSE event gains tokens_so_far / max_tokens / started_at).

Tokens + shell · top-bar + footer chrome · queue · detail single-view · inspector

Model picker popover (⌘K) · global keyboard layer + ? cheatsheet

Empty/loading/error states · live progress %/ETA/tok/s (only task touching the worker)

Hotfix bundle 2026-05-15: prompt body field, Davas models in picker, +new button visible, row click opens detail, centre pane width cap, chat page restored, SSE hardening, real Gemini 3.x IDs, filter bar wrap

Phase UI-2 (React port) 17 tasks · 2026-05-16 → 17

Decision made 2026-05-16: stack swap to React 18 + Vite + TypeScript + Zustand + React Query. Reasoning: append-only token streaming and multi-pane synchronised re-renders showed concrete pain (P-UI.21 stream-token-deltas deferred from UI-1).

Vite scaffold · backend JSON endpoints · shell+chrome · virtualised queue (@tanstack/react-virtual)

Single view + StreamingOutput + StatStrip (60Hz RAF-throttled) · inspector · ModelPicker · DispatchOverlay · Cheatsheet

ChatView re-port · Dockerfile multi-stage · OLD UI DELETE PASS (≥3,000 LoC)

siblings + prompt_hash · CompareView · StackedView · TournamentView + ratings SQLite

daily_stats SQLite table · SSE reconnect with seq + ring buffer · responsive collapse

+HF.1 hotfix 2026-05-17: compare/tournament duplicate-job slug bug (filename collision when N jobs land in same second), stat strip on cards, global CSS reset

// bottom line where the plan held, where it bent

Plan held

Repo layout (§4): five repos exactly as specced.
Phase 0–6 scope: shipped as written, in order.
Non-negotiables (150 LoC/file, pytest only, Pydantic Settings, no os.environ, Conventional Commits): held throughout.
Vector store (Qdrant), graph store (Neo4j), MCP framework (FastMCP), tracing (OTel→Alloy→Grafana Cloud): all unchanged.
Filesystem-as-queue (§8): unchanged.
Public ingress via Cloudflare Tunnel (no open ports): held.
Cutover sequencing (P9): runbook executed cleanly.

Plan bent

graphiti dropped — required OpenAI Responses API, no free-tier path.
3 of 4 P8 agents held — no concrete trigger for briefing/research/memory.
UI rewritten twice — HTMX → terminal skin → React/TS.
~50 unplanned tasks filed under QoL / Skills / UI / UI-2.
CI deploy stage deferred indefinitely; manual deploy.sh wins.
Davas reachable via Tailscale, not the CF tunnel SPEC §6 named.
7 SPEC drift fixes reconciled 2026-05-14 (bucket names, default model, env vars, services).
SPEC.md rotted from day 3 onward — "execution decisions flowed into tasks.md / RUN-SUMMARY.md only." Reconciliation pass closed the gap, but pattern recurs unless habit changes.