▮ rebuild ~ audit › plan vs actual ● shipped 10d elapsed v1 → v∞ 2026-05-08 → 2026-05-17

SPEC v1 — and what we actually shipped

Original plan (2026-05-08) had 10 single-track phases ending at cutover. Final delivery: 10 phases + 4 unplanned ones, three architectural pivots, one full UI rewrite. Read left-to-right: planned ─→ delivered.
  PLANNED                                                                     ACTUAL
  ───────                                                                     ──────
  P0  foundations         ────────────────────────────────────────────────────►  P0  done   2d
  P1  infra (TF+ansible)  ────────────────────────────────────────────────────►  P1  done   1d
  P2  observability       ──────────────────────────────────[deferred alerts/dash]─►  P2  done   1d (+P2.5/2.6 6d later)
  P3  runner foundation   ────────────────────────────────────────────────────►  P3  done   2d
  P4  worker              ────────────────────────────────────────────────────►  P4  done   1d (P-QoL.2 follow-up)
  P5  web + chat (HTMX)   ────────────────────────────────────────────────────►  P5  done   1d → later thrown away
  P6  MCP                 ────────────────────────────────────────────────────►  P6  done   3d
  P7  RAG + KG (graphiti) ─[graphiti DROPPED 2026-05-14]─────────────────►  P7  done   hand-rolled KG
  P8  agents (briefing/   ──[3 of 4 HELD indefinitely]────────────────────────►  P8  done   only token-tracker
       research/memory)
  P9  cutover             ────────────────────────────────────────────────────►  P9  done   1d
  ·· · · · · · · ·  end of original plan  · · · · · · · · · ··
                                                                              + QoL    18 tasks  ─►  done
                                                                              + Skills  6 tasks  ─►  done
                                                                              + UI-1    9 tasks  ─►  done  (terminal redesign)
                                                                              + UI-2   17 tasks  ─►  done  (HTMX → React/TS rewrite)
phases
1014
+ QoL · Skills · UI · UI-2
tracked tasks
~60110+
~50 unplanned QoL/Skills/UI items
elapsed
10 d
no original estimate given
UI rewrites
0
PicoCSS → TUI → React
tech swaps mid-flight
03
graphiti · agents · UI stack
delivered as planned scope changed / deferred dropped or replaced added (unplanned) plan reference
// phase-by-phase SPEC §5 phase order ─→ what shipped per phase
phase
planned (SPEC §5 / §14)
delivered
when
P0
Infisical project, machine identities, AWS+S3 buckets, GitLab repos with baseline CI.
All 6 sub-tasks done. PF.1–PF.6 reconciliation block added up-front to bridge spec assumptions vs live infra (Infisical already existed, S3 buckets already named).
2026-05-08 · 2d
P1
Terraform S3 backend; Contabo+homelab bootstrapped via Ansible (Docker, Tailscale, Alloy, Infisical agent).
All 6 sub-tasks done in one push. Contabo firewall captured as Ansible role (no Contabo TF firewall resource exists), iptables-nft chains preserved verbatim.
2026-05-10 · 1d
P2
Alloy on both hosts → Grafana Cloud (logs/metrics/traces). Dashboards committed. Discord alerts wired.
Pipeline up 2026-05-11. P2.5 alerts & P2.6 dashboards deferred per "no alerts before stack is live" policy; finally landed 2026-05-14 via Grafana Cloud LLM-assist (4 alerts, 4 dashboards — spec called for 3).
05-11→14
P3
Settings, providers, OTel tracing, docker-compose boot, infisical-run wrapper.
All 7 sub-tasks done. Default model corrected gemma4:2bgemma4:e2b (the real Ollama tag); discovered during execution.
05-10→11 · 2d
P4
Atomic queue poller, executor, text/chain/research handlers, Discord on success/fail, OTel spans.
All 6 done. Spec said "Discord final-step only" — turned out to be a UX regression vs old runner. Fixed later via P-QoL.2: widened return type to JobOutcome(text, tokens, model), executor owns notifications, chain's inline notify deleted.
2026-05-11 · 1d
P5
FastAPI factory, /health, SSE chat, queue submit/list, HTMX UI, SQLite chat history, Cloudflare tunnel swap.
All 7 done; HTMX UI live behind control.davidcockson.com. But this UI was thrown away twice — first by Phase UI (Terminal/TUI redesign), then by Phase UI-2 (React port). Net: ~3,200 lines of UI code deleted in P-UI2.9.
2026-05-11 · 1d
P6
FastMCP server, tools: search (ddgs+Tavily), fetch (trafilatura), vault, sandboxed code-exec. Research mode wired into chat.
All 6 done. /health route was missing — `curl -s` was masking 404 as healthy because curl exits 0 on 404. Fixed in P-QoL.7 with FastMCP `custom_route` decorator + `curl -fsS`.
05-11→14 · 3d
P7
Qdrant collections; graphiti-core KG over Neo4j; vector_indexer+kg_indexer; LangGraph hybrid pipeline; nightly re-index cron.
graphiti DROPPED 2026-05-14 after 3 failed cloud-provider attempts (Groq llama 400 on json_schema; Groq gpt-oss 413 TPM; Gemini OpenAI-compat 404 on /responses). Root cause: graphiti requires OpenAI Responses API + structured outputs — no viable free tier. Replaced with hand-rolled Ollama extractor (~150 LoC, (:Entity)-[:RELATES_TO {fact}]→(:Entity) + fulltext index).
05-13→14
P7.8
(implicit) worker's research handler calls the local RAG service.
Surfaced mid-Phase: the RAG service had no upstream caller. Wiring it up exposed a 240s latency bottleneck on Contabo CPU — worker discards RAG's answer field by design (uses its own cloud model). Fixed by P-QoL.11: added mode=context-only to skip RAG's two LLM nodes — round-trip ~120s → <1s.
2026-05-14
P8
4 tasks: per-provider token tracker + briefing / research / memory agents.
3 of 4 HELD indefinitely 2026-05-14 — "none of the three has a concrete trigger right now; building on spec alone risks rework." Only P8.4 token tracker shipped.
2026-05-14
P9
Vault snapshot to S3, decommission old Contabo runner, runbook, security audit.
P9.1 snapshot skipped (vault already covered by Gitea + Syncthing + prior nightly S3). Old runner stopped+disabled; Langfuse torn down; monitoring stack archived; §7 security audit pass; obsidian-llm-runner repo archived.
2026-05-15 · 1d
+QoL
— not in spec —
18 polish tasks filed during execution: UI link to outputs, Discord success regression fix, deploy wrapper, registry dedup, qdrant version drift, Gemini prefix comment, MCP health, Davas-via-Tailscale, CF Access bypass, chat sessions, required-secrets manifest, RAG context-only mode, Neo4j auto-bootstrap, single canonical deploy command, executable bit, logger config, Davas e2e verification.
05-13→14
+Skills
— not in spec —
Agent Skills (§17, added 2026-05-14): user-authored skill packages selectable per job. 6 tasks: loader, /skills endpoint + Job.skill_id, worker prompt injection + pin-model, UI picker, Python script path via P6.5 sandbox, live reload + echo-shout/hash-calc examples. Explicit-selection, not LLM-routed.
2026-05-14
+UI-1
— not in spec —
Terminal/TUI redesign (§18, added 2026-05-14): phosphor green, JetBrains Mono, scanlines, 3-pane grid. 9 tasks (tokens, top-bar+footer, queue, detail, inspector, model picker, keyboard layer, empty/loading/error, live progress %/ETA/tok-s). HTMX preserved. Done 2026-05-14. + 9 hotfixes during live use 2026-05-15.
05-14→15
+UI-2
— not in spec —
Full React port (decision 2026-05-16): Vite + React 18 + TypeScript. 17 tasks (P-UI2.0–P-UI2.16): backend JSON endpoints, shell+tokens+chrome, virtualised queue, single/compare/stacked/tournament views, inspector, ModelPicker+DispatchOverlay+Cheatsheet, ChatView re-port, Dockerfile multi-stage, old-UI delete pass (≥3,000 LOC deleted), prompt_hash+siblings, daily_stats SQLite, SSE reconnect with seq, responsive collapse. Done 2026-05-17.
05-16→17
// build order spec order was strictly P0→P9. Actual order: bursty.
05-08 · thu
  • scaffold spec+tasks
  • PF.1–PF.6 reconcile
  • P0.1 secrets
  • P0.2 identities
  • P0.3 AWS IAM
5 items · planning
05-10 · sat
  • P0.4 → P0.6
  • P1.1 → P1.6 (all infra)
  • P2.1 grafana bootstrap
  • P3.1 → P3.5
  • P3.6 compose (partial)
~14 items · huge sprint
05-11 · sun
  • P2.2 → P2.4 alloy
  • P3.6, P3.7 deploy
  • P4.1 → P4.6 worker
  • P5.1 → P5.7 web
  • P6.1 mcp boot
~17 items · second sprint
05-13 · tue
  • UI model-drift fix
  • P7.5 LangGraph hybrid
  • P7.6 re-index cron
  • graphiti starts wobbling
~4 items · RAG arc
05-14 · wed
  • graphiti dropped
  • P7.7 hand-rolled KG
  • P7.8 worker↔RAG
  • P-QoL bundle A/B/C/D
  • P2.5/P2.6 alerts+dash
  • +Skills phase filed
  • +UI phase filed
  • P8 trimmed to P8.4
~12 items · pivot day
05-15 · thu
  • P9.1 → P9.8 cutover
  • Langfuse torn down
  • monitoring decom
  • security audit §7
  • UI-1 hotfix bundle
~10 items · cutover
05-16 · fri
  • React port begins
  • P-UI2.0 vite scaffold
  • P-UI2.1 backend JSON
  • P-UI2.2 → P-UI2.9
  • old HTMX UI deleted
~10 items · rewrite
05-17 · sat
  • P-UI2.10 siblings
  • P-UI2.11 compare
  • P-UI2.12 stacked
  • P-UI2.13 tournament
  • P-UI2.14 daily_stats
  • P-UI2.15 SSE reconnect
  • P-UI2.16 responsive
  • +HF.1 hotfix · shipped
~8 items · UI-2 finish
// sundays + tuesdays were the heaviest. days off (05-09, 05-12) absent from the log.
// notable pivots where the plan said one thing and reality said another
DROPPED

graphiti-core for the KG layer

P7 · 2026-05-14
specgraphiti-core (Zep) on top of Neo4j — agent KG library, temporal valid-from/valid-to.
shippedHand-rolled Ollama-driven extractor, ~150 LoC. Schema: (:Entity {name})-[:RELATES_TO {fact}]→(:Entity) + Lucene fulltext index for retrieval. No LLM at retrieval time.
Three failed cloud-provider attempts in a row: Groq llama 400 on json_schema; Groq gpt-oss 413 on TPM; Gemini OpenAI-compat 404 on /responses. Root cause: graphiti requires the OpenAI Responses API + structured outputs — no viable free tier.
HELD

Phase 8 agents (3 of 4)

P8 · 2026-05-14
specbriefing.py (daily cron), research.py (scheduled topic research), memory.py (fact extraction → graphiti)
shippedOnly P8.4 per-provider daily token tracker on /health. Other three kept in §14 under a "Held" sub-block.
Per user direction: "none of the three has a concrete trigger right now; building them on spec alone risks rework once the actual use-case shows up." Promotion back is gated on a real use-case appearing.
REWRITTEN ×2

HTMX UI → Terminal/TUI → React

P5 → +UI-1 → +UI-2 · 05-11 → 05-17
v1PicoCSS + HTMX (Phase 5) — shipped 2026-05-11
v2Terminal/TUI skin (Phase UI-1) — 9 tasks, design handoff at design_handoff_runner_tui/. Still HTMX. Done 2026-05-14.
v3React 18 + Vite + TypeScript (Phase UI-2) — 17 tasks. ≥3,000 LoC of HTMX deleted in P-UI2.9. Done 2026-05-17.
Spec §16 explicitly parked the frontend-stack decision as an open item to revisit when concrete pain showed up. Pain showed up: append-only token streaming, multi-pane synchronised re-renders. React port subsumed P-UI.10 (stack decision) and P-UI.21 (stream token deltas).
DEFERRED

CI build + deploy automation

SPEC §13 · still deferred
spectest → build all images → push to GitLab registry → ssh Contabo → docker compose up
shippedTest stage only. Build + deploy is manual ./deploy.sh from laptop after merge to main (P-QoL.13 collapsed it from a 6-flag incantation to one command).
"Post-MVP; track as a future task when cutover (P9) is complete and the manual flow's friction outweighs the CI work." That trigger has not fired.
REROUTED

Davas: Cloudflare Tunnel → Tailscale

P-QoL.8 · 2026-05-14
specDAVAS_BASE_URL=https://davas-ollama.davidcockson.com
shippedDAVAS_BASE_URL=http://100.84.112.38:11434 (Tailscale CGNAT). The CF tunnel value was returning 530 — never worked end-to-end.
Surfaced P-QoL.18: required-secrets manifest + fail-loud on default. Infisical silently does nothing when an expected secret is absent — the worker had been falling through to a broken default for weeks.
REFRAMED

Discord notifications: "fail only" → success too

P4.4 → P-QoL.2 · 2026-05-13 → 14
spec"Final-step only for chains; immediate for failures."
shippedPer-completion embed with tokens, duration, model, and a View link if PUBLIC_BASE_URL is set. Widened handler return type to JobOutcome(text, tokens, model); executor owns the post.
Spec-compliant but a UX regression vs the old runner's complete | model | tokens | ms post. User expectation gap caught during live use.
// tech stack diff SPEC §3 table vs what's actually running on Contabo
layer
planned
shipped
Frontend
HTMX + PicoCSS (SPEC §10)
React 18 + Vite + TypeScript + Zustand + React Query + @tanstack/react-virtual
KG library
graphiti-core (Zep)
Hand-rolled Ollama extractor (~150 LoC)
KG schema
Temporal valid-from/valid-to facts
(:Entity)-[:RELATES_TO {fact}]→(:Entity) + fulltext index
Default model
gemma4:2b (spec changelog 2026-05-10)
gemma4:e2b (real Ollama tag)
Davas route
Cloudflare Tunnel
Tailscale 100.84.112.38
Deploy
GitLab CI build + push + ssh deploy
./deploy.sh from laptop (test stage only in CI)
S3 bucket names
dc-tfstate · dc-vault-snapshots · dc-rebuild-artifacts
davidcockson-tfstate · davidcockson-vault-snapshots · davidcockson-artifacts (us-east-1)
IAM user
terraform
homelab-svc (pre-existed)
Docs site
mkdocs
— replacement TBD (open item) —
CI deploy
test → build → push → ssh + docker compose
test only · build+deploy manual (deferred)
Held agents
briefing.py · research.py · memory.py
— parked indefinitely —
// unplanned phases — what got added ~50 tasks no-one wrote down on 2026-05-08
Phase QoL 18 tasks · 2026-05-13 → 14
Polish bundle filed when smoke-testing the running stack surfaced 18 small but real frictions. Originally a single audit; grew as use revealed it.
UI link to completed outputs · Discord success regression · deploy wrapper rebuild flag
Router/UI model dedup · qdrant version drift · Gemini prefix doc
MCP /health + tighter healthcheck · Davas → Tailscale · CF Access bypass for /health
Smoke-test runbook fixes · RAG context-only mode · auto-bootstrap Neo4j schema
Single canonical deploy command · chat sessions (new/switch/delete)
deploy.sh executable bit · RAG logger config · Davas e2e verification · required-secrets manifest
Phase Skills 6 tasks · 2026-05-14
SPEC §17 added mid-flight: user-authored skill packages selectable per job, as a testing harness for Dave's own skills — not an LLM-routed autonomous capability layer. Explicit selection in the UI.
Skill loader (scan SKILLS_DIR, parse SKILL.md frontmatter+body, reject duplicates)
/skills endpoint + Job.skill_id field
Worker injects skill instructions; pin-model override; logged
UI skill picker next to model picker
Python script path (scripts/main.py) via the P6.5 sandbox
Live reload (/skills/reload) + echo-shout / hash-calc bundled examples
Phase UI-1 (terminal redesign) 9 tasks + 9 hotfixes · 2026-05-14 → 15
SPEC §18: phosphor-green 3-pane terminal aesthetic replacing the PicoCSS HTMX UI. Fidelity-first port of the design_handoff_runner_tui/ bundle. HTMX preserved; one backend payload tweak (SSE event gains tokens_so_far / max_tokens / started_at).
Tokens + shell · top-bar + footer chrome · queue · detail single-view · inspector
Model picker popover (⌘K) · global keyboard layer + ? cheatsheet
Empty/loading/error states · live progress %/ETA/tok/s (only task touching the worker)
Hotfix bundle 2026-05-15: prompt body field, Davas models in picker, +new button visible, row click opens detail, centre pane width cap, chat page restored, SSE hardening, real Gemini 3.x IDs, filter bar wrap
Phase UI-2 (React port) 17 tasks · 2026-05-16 → 17
Decision made 2026-05-16: stack swap to React 18 + Vite + TypeScript + Zustand + React Query. Reasoning: append-only token streaming and multi-pane synchronised re-renders showed concrete pain (P-UI.21 stream-token-deltas deferred from UI-1).
Vite scaffold · backend JSON endpoints · shell+chrome · virtualised queue (@tanstack/react-virtual)
Single view + StreamingOutput + StatStrip (60Hz RAF-throttled) · inspector · ModelPicker · DispatchOverlay · Cheatsheet
ChatView re-port · Dockerfile multi-stage · OLD UI DELETE PASS (≥3,000 LoC)
siblings + prompt_hash · CompareView · StackedView · TournamentView + ratings SQLite
daily_stats SQLite table · SSE reconnect with seq + ring buffer · responsive collapse
+HF.1 hotfix 2026-05-17: compare/tournament duplicate-job slug bug (filename collision when N jobs land in same second), stat strip on cards, global CSS reset
// bottom line where the plan held, where it bent
Plan held
  • Repo layout (§4): five repos exactly as specced.
  • Phase 0–6 scope: shipped as written, in order.
  • Non-negotiables (150 LoC/file, pytest only, Pydantic Settings, no os.environ, Conventional Commits): held throughout.
  • Vector store (Qdrant), graph store (Neo4j), MCP framework (FastMCP), tracing (OTel→Alloy→Grafana Cloud): all unchanged.
  • Filesystem-as-queue (§8): unchanged.
  • Public ingress via Cloudflare Tunnel (no open ports): held.
  • Cutover sequencing (P9): runbook executed cleanly.
Plan bent
  • graphiti dropped — required OpenAI Responses API, no free-tier path.
  • 3 of 4 P8 agents held — no concrete trigger for briefing/research/memory.
  • UI rewritten twice — HTMX → terminal skin → React/TS.
  • ~50 unplanned tasks filed under QoL / Skills / UI / UI-2.
  • CI deploy stage deferred indefinitely; manual deploy.sh wins.
  • Davas reachable via Tailscale, not the CF tunnel SPEC §6 named.
  • 7 SPEC drift fixes reconciled 2026-05-14 (bucket names, default model, env vars, services).
  • SPEC.md rotted from day 3 onward — "execution decisions flowed into tasks.md / RUN-SUMMARY.md only." Reconciliation pass closed the gap, but pattern recurs unless habit changes.