log
Run log
Newest first · append-only
2026-05-14
Batch D — registry unification + Discord success notifications (Opus)
P-QoL.4: router.available_models(settings) exposed in shared/providers/router.py — returns _MODEL_PROVIDER entries filtered by configured credentials. services/web/routes/ui.py:list_models now calls it instead of holding a hardcoded list. Smoke-verified on Contabo: gemma4:27b (Davas-routed, not in local /api/tags) now appears on /models alongside dynamic Ollama entries.
Bundled fix: extra="ignore" on SettingsConfigDict — Compose-only .env vars (CLOUDFLARED_TUNNEL_TOKEN, GRAFANA_TEMPO_*) no longer crash Settings(). Unblocked 16 router-test errors.
P-QoL.2: JobOutcome(text, tokens, model) record widens handler return type. Executor gains _notify_success mirroring _notify_failure, a time.monotonic() timing wrapper, and _build_output_url (returns None when PUBLIC_BASE_URL unset). Chain handler's in-handler discord.notify deleted. docker-compose.yml worker env forwards PUBLIC_BASE_URL. test_discord_not_called_on_success inverted; +8 success-notify tests; -2 chain-discord tests. 55 worker tests pass.
Surfaced: Design docs reference SDK shapes that drift (P-QoL.2 design assumed result.usage.total_tokens; reality has flat input_tokens/output_tokens ints on CompletionResult). Adding a new setting takes 4 edits, not 3: Pydantic field + Infisical secret + deploy.sh path + Compose environment: forwarding. Patching time.monotonic with iter([t0,t1]) exhausts mid-test because asyncio also calls it.
2026-05-14
Documentation pass — designs, runbook, audits, SPEC reconciliation (Opus, parallel to P7.7)
Phase QoL formalised: six tasks P-QoL.1–P-QoL.6 added to tasks.md. P-QoL.1 (UI link to completed outputs) is new; P-QoL.2–.6 lift previously-parked items into trackable form.
Designs written: platform-docs/design/p-qol-1-ui-output-link.md (URL param job_id, markdown-it-py renderer with html=False, right-pane swap pattern, 7 tests). platform-docs/design/p-qol-2-discord-success-notifications.md (widen handler return to JobOutcome, executor measures duration externally, delete chain.py in-handler notify, optional PUBLIC_BASE_URL, 9 tests).
Runbook written: platform-docs/runbooks/smoke-test.md — 10-step round-trip (tunnel, models, Davas, text job, RAG, KG client, MCP) + connectivity sweep + triage cheatsheet + exit-criteria checklist. KG client construct is the explicit P7.7 exit gate.
SPEC drift reconciliation: 7 fixes to SPEC.md — bucket names davidcockson-* (us-east-1), default model gemma4:e2b, layout additions (routes/ui.py, shared/notifications/), reindexer service, env vars (CLOUDFLARED_TUNNEL_TOKEN, PUBLIC_BASE_URL), CI deploy marked deferred.
Audits: tasks.md staleness pass (pruned ticked-but-verbose entries on P3.6, P4.4, P5.7); memory freshness pass; new memory project_monitoring_deferred.md.
2026-05-13
P7.3–P7.6 RAG + KG complete · UI model-name drift fixed (Sonnet/Opus)
P7.3: Neo4j + graphiti config wired; auth pulled from Infisical.
P7.4: kg_indexer.py + kg_retriever.py — graphiti ingest, Cypher neighbourhood query.
P7.5 (Opus): services/rag/graph.py LangGraph pipeline — decompose → parallel(vector, kg) → rerank → synthesise.
P7.6: nightly re-index cron (02:00 UTC) in reindexer container.
UI fix: /models endpoint was returning vendor-prefixed names (groq/…) that didn't match the router table — fixed in services/web/routes/ui.py (commit e117dba). Root cause: registry duplication between UI and router — tracked as P-QoL.4 for refactor.
Surfaced: Discord success-notifications regression vs old runner — non-chain completions are silent. Tracked as P-QoL.2 with full design.
2026-05-12
P7.2 vector indexer + retriever (Sonnet)
Done: vector_indexer.py chunks vault .md files, batch-embeds via Ollama nomic-embed-text, upserts to Qdrant with deterministic SHA-256 point IDs. vector_retriever.py uses qdrant.query_points() (qdrant-client 1.18 API — search() removed). 30 tests, ruff + mypy --strict clean. MR merged.
Surfaced: qdrant-client 1.18 replaced QdrantClient.search() with query_points() returning QueryResponse.points.
2026-05-12
P7.1 Qdrant collections init + P6.6 research mode wired (Sonnet)
P7.1: idempotent init_collections.py creates vault, web, memory (768-dim cosine). Uses os.environ directly — avoids VAULT_PATH requirement in init containers. MR!21 merged.
P6.6: shared/mcp_client.py MCPToolClient wraps FastMCP StreamableHttpTransport. Research jobs call web_search() via MCP, fall back gracefully when MCP unavailable. Pre-integration audit found vault mount :ro in MCP container silently breaking vault_write since P6.4 — fixed here.
2026-05-12
P6.5 sandboxed exec + P6.4 vault tools + P6.3 fetch + P6.2 search (Sonnet/Opus)
P6.5 (Opus): run_python tool — rlimits (CPU, AS, FSIZE, NOFILE) + scrubbed env + start_new_session=True + optional bubblewrap when bwrap present. 11 tests.
P6.4: vault_read/write/list — path traversal blocked via Path.resolve().is_relative_to().
P6.3: fetch_page — trafilatura sync batched into asyncio.to_thread. 50k char cap.
P6.2: web_search — ddgs primary (sync wrapped), Tavily fallback via httpx.
Surfaced: from server import mcp created a separate module instance from services.mcp.server — tools registered on different objects. Fixed with full dotted import path.
2026-05-11
P5.7 runner end-to-end green + P6.1 FastMCP server (Sonnet)
P5.7: control.davidcockson.com live. Worker Dockerfile fixed (copy services/+shared/, PYTHONPATH=/app). Secrets: no secrets in .env — all via infisical run --recursive. cloudflared shows unhealthy in docker compose ps but tunnel IS connected (4 edge connections) — healthcheck timing issue only.
P6.1: FastMCP v3 server boots, ping tool, OTel lifespan. 11 tests.
Surfaced: Secrets leaked into chat via grep before cleaning .env — rotate Anthropic, Google AI, Groq, Discord keys. Old uvicorn killed; new runner owns hostname permanently.
2026-05-11
P5.1–P5.6 Web + Chat complete · P4.1–P4.6 Worker complete · P2.2–P2.4 Alloy rolled (Sonnet)
P5: FastAPI factory + lifespan, health probes (Ollama/Davas/Qdrant/Neo4j), SSE streaming, queue submit/list, HTMX UI (chat + queue + model picker + Davas dot), SQLite chat history.
P4: Atomic poller with 2s mtime guard + crash recovery, executor with traceback-in-frontmatter failures, text/chain/research job types, Discord notifications, OTel spans on every provider call.
P2: Full Grafana Cloud Alloy pipeline (Loki/Mimir/Tempo) deployed to Contabo and homelab. Loki + Mimir shipping; OTLP port remapped to 14317 (Langfuse holds 4317 until P9).
2026-05-10
P3.1–P3.7 Runner foundation + P1.1–P1.6 Infrastructure + P0.1–P0.6 Foundations (Sonnet/Opus)
P3: pyproject.toml + uv, Pydantic Settings v2, provider clients (Ollama/Groq/Gemini/Anthropic), router with Davas health-check rule, OTel SDK → Tempo, docker-compose (7/8 healthy locally), infisical deploy wrapper.
P1: Terraform S3 backend + Cloudflare DNS, Ansible firewall capture (Contabo iptables snapshot), infra-shared roles (docker/tailscale/infisical-agent/alloy), contabo + homelab playbooks (idempotent, --check green).
P0: Infisical secrets confirmed, machine identities verified, AWS IAM confirmed, S3 versioning confirmed, 5 GitLab repos created + CI green, local clones done.