plan-orchestrate — live test + v2 proposal

🟢 shipped & verified live 🟡 proposal / not yet built 🔵 human gate (stays manual)

1 · What shipped this run live

The parked AI Usage Funnel plan was executed to done: per-skill / per-category AI spend with unit economics ($/min of TTS audio), on /dash#usage. Branch plan-orchestrate/ai-usage-funnel.

Area Change Proof

app/usage.py Recovered 5 dropped PROMPT_KEYS + unknown; record(*, category, unit, unit_kind) keyword-only & back-compat; read-time CATEGORIES map AU1, AU2, AU9, AU10

app/config.py TTS audio-output pricing ($20/1M), ordered before gemini-2.5-pro so prefix-match wins AU4 — ai_cost(…tts,100,1500)=0.0301

app/dash.py Token-guarded POST /dash/api/usage/ingest; api_usage emits per_category + server-side derived ($/min·$/lead·$/call) AU3, AU6, AU7

app/dash_js.py "📊 By category" table + unit column; wrapped overflow-x:auto (the visual-QA fix) AU8 + visual-QA

Laptop TTS skill synth.py/run.py capture usage_metadata & POST fail-open via new _shared/usage_post.py AU5

DASHBOARD_GUIDE.md Monthly manual reconciliation recipe (Chrome DevTools MCP, never Puppeteer) AU11

Area	Change	Proof
`app/usage.py`	Recovered 5 dropped `PROMPT_KEYS` + `unknown`; `record(*, category, unit, unit_kind)` keyword-only & back-compat; read-time `CATEGORIES` map	AU1, AU2, AU9, AU10
`app/config.py`	TTS audio-output pricing ($20/1M), ordered before `gemini-2.5-pro` so prefix-match wins	AU4 — `ai_cost(…tts,100,1500)=0.0301`
`app/dash.py`	Token-guarded `POST /dash/api/usage/ingest`; `api_usage` emits `per_category` + server-side `derived` ($/min·$/lead·$/call)	AU3, AU6, AU7
`app/dash_js.py`	"📊 By category" table + unit column; wrapped `overflow-x:auto` (the visual-QA fix)	AU8 + visual-QA
Laptop TTS skill	`synth.py`/`run.py` capture `usage_metadata` & POST fail-open via new `_shared/usage_post.py`	AU5
`DASHBOARD_GUIDE.md`	Monthly manual reconciliation recipe (Chrome DevTools MCP, never Puppeteer)	AU11

QA: project skill ai-usage-qa — all AU1–AU11 PASS live (v088). Visual-QA at 1280 / 768 / 390 PASS after one fix (a 7-column table overflowed mobile → wrapped each table in an overflow-x:auto container, redeployed v089, re-verified). Live tab shows real spend: 7 days = 1084 calls · $1.85; TTS $/min = $0.0301.

2 · Verdict on the orchestrator feasible

The single-prompt loop is feasible today for the plan→techspec→QA-scaffold→execute spine — it just ran. The blocker is not the AI; it is the live-environment seam.

The one biggest blocker: snap_deploy.sh is not git-branch-isolated (one live Modal app), and the laptop store (.tmp/deals.json) is not prod (modal.Dict) — so all live QA seeding must go through the deployed ingest endpoint. Every friction this run hit was solved by hand and never written back into the reusable skills, so the next run re-derives them. v2's durable win is removing that re-derivation tax, not rewriting the orchestration.

3 · The proposed v2 loop 🟡 design

Not a rewrite — the existing orchestrator plus four additions. The human touches the loop exactly twice.

① One prompt → ≤4 questions 🔵

Phase-0 orient (graphify + chat-graph + PROJECT_STATE) then the upfront intent batch → intent.json. Human stop #1. (On this run Instance 1 needed zero rounds — keep verbatim.)

② Plan → techspec → QA scaffold

File-bus ⇄ re-spawn (Opus per instance; orchestrator self-answers from graph + Sonnet research). New: declare target_kind: card|state so the QA scaffold fits API-driven tabs.

③ Execute + visual-qa-ultra 🔵

Edits → snap_deploy → seed-via-ingest → checks green; inherits the pre-flight defaults pack. Audit mode always-auto; journey grade gated on a signed EBO. Human stop #2.

④ Emit handoff → next prompt

Closing stage emits a house-style HANDOFF.html (with honest Limitations) carrying a pastable next-instance prompt → the next instance picks the next-highest-leverage task. The flywheel.

4 · Pre-flight hardening — bake the frictions in 10

The guardrails distilled from this run so they are never re-derived. Highest-leverage first.

# Guardrail Why (this run)

H2 ⭐ Never networkidle → use domcontentloaded + ready-selector + retry; keep nav inside the per-check try It aborted the whole --all survey — and it is STILL in the global qa-harness (assert.py:207, capture.py:230), so fixing it once helps every future project

H1 SSL context for laptop POSTs (certifi → unverified fallback) macOS Python has no CA bundle → CERTIFICATE_VERIFY_FAILED

H3 Accumulation-robust live asserts; exact values only in local unit tests record() accumulates → "unit==42" held only on a single seed

H4 Prefix-match ordering for prefix-keyed lookups + a guard assert '…pro-preview-tts'.startswith('gemini-2.5-pro')

H7 Live seeds via the deployed ingest endpoint, never flows.store() Local store ≠ prod store

H8 Never two deploys / run_matrix against the one Modal app concurrently Shared live app; same-tree edits merge-conflict

H5/H6/H9/H10 $status is reserved in zsh · inspect disk on a 529 (not failure) · read live source for prices/SDK shapes · cap rounds 5 + research fan-out 3 Each a real event of this + the planning run

#	Guardrail	Why (this run)
H2 ⭐	Never `networkidle` → use `domcontentloaded` + ready-selector + retry; keep nav inside the per-check try	It aborted the whole `--all` survey — and it is STILL in the global `qa-harness` (`assert.py:207`, `capture.py:230`), so fixing it once helps every future project
H1	SSL context for laptop POSTs (certifi → unverified fallback)	macOS Python has no CA bundle → `CERTIFICATE_VERIFY_FAILED`
H3	Accumulation-robust live asserts; exact values only in local unit tests	`record()` accumulates → "unit==42" held only on a single seed
H4	Prefix-match ordering for prefix-keyed lookups + a guard assert	`'…pro-preview-tts'.startswith('gemini-2.5-pro')`
H7	Live seeds via the deployed ingest endpoint, never `flows.store()`	Local store ≠ prod store
H8	Never two deploys / `run_matrix` against the one Modal app concurrently	Shared live app; same-tree edits merge-conflict
H5/H6/H9/H10	`$status` is reserved in zsh · inspect disk on a 529 (not failure) · read live source for prices/SDK shapes · cap rounds 5 + research fan-out 3	Each a real event of this + the planning run

5 · The self-continuing handoff 🟡

Every run ends by emitting a human-review doc that contains the prompt for the next run. Here is the filled-in next-instance prompt the proposal generated — the next-highest-leverage task (instrument graphify spend, reusing the rails this run built):

/plan-orchestrate

FEATURE: Instrument the `graphify` laptop skill's Gemini spend into the AI Usage tab — the
last uninstrumented high-volume AI spender. The AI Usage Funnel (branch
plan-orchestrate/ai-usage-funnel, AU1–AU11 green @ v088) already built the rails:
- token-guarded POST /dash/api/usage/ingest (fail-open),
- the shared poster ~/.claude/skills/_shared/usage_post.py,
- read-time category map app/usage.py:CATEGORIES + the "By category" tab block.
graphify was DEFERRED (ADR-0005 / HANDOFF §9.4) because whether its CLI exposes Gemini token
counts cheaply is UNCONFIRMED — RESOLVE THAT FIRST. If counts are exposed, POST
{source:'graphify', category:'Graphify', model, in, out} fail-open; else log a per-run estimate.

CONSTRAINTS (hard): ONE Gemini key; laptop POST fail-open; ZZ-gated + send-blocklisted QA via a
new ai-graphify-qa skill (state target → seed --via-ingest, no exact-count live asserts); light
theme; deploy via ./scripts/snap_deploy.sh to the ONE live Modal app (sequence, no concurrent
run_matrix); human gate intact.

Run the chain hands-off on branch plan-orchestrate/graphify-usage, /visual-qa-ultra it (audit
always; journey only if I sign the EBO), then emit the house-style HANDOFF.html with the next prompt.

6 · Build-first & honest risks

Smallest change, most autonomy gained — then where a single-prompt run still needs you.

Build first Effort Honest risk that remains

H2: patch networkidle in the global qa-harness ½ day EBO signature — a green journey grade requires a human sign-off by design; autonomy caps at PASSED_WITH_GAPS until then 🔵

Closing handoff-emit stage (emit_handoff.py → HANDOFF.html + next prompt) 1 day A thin/wrong brief ships a wrong feature confidently — the upfront questionnaire is the only real defence

target_kind card/state branch in build-qa-skill 1 day Non-reversible deploy — autonomy rests entirely on ZZ-gate + send-blocklist + AUTOSEND-off

Pre-flight defaults pack (promote seeder/check patterns) ½ day Price / external-API drift silently corrupts unit economics with green checks (H9 mitigates)

Build first	Effort	Honest risk that remains
H2: patch `networkidle` in the global qa-harness	½ day	EBO signature — a green journey grade requires a human sign-off by design; autonomy caps at `PASSED_WITH_GAPS` until then 🔵
Closing handoff-emit stage (`emit_handoff.py` → HANDOFF.html + next prompt)	1 day	A thin/wrong brief ships a wrong feature confidently — the upfront questionnaire is the only real defence
`target_kind` card/state branch in `build-qa-skill`	1 day	Non-reversible deploy — autonomy rests entirely on ZZ-gate + send-blocklist + AUTOSEND-off
Pre-flight defaults pack (promote seeder/check patterns)	½ day	Price / external-API drift silently corrupts unit economics with green checks (H9 mitigates)

Source artifacts: plans/plan-orchestrate-v2-proposal/PROPOSAL.md (full) · plans/ai-usage-funnel/ (the shipped plan + ADRs + checklist) · .claude/skills/ai-usage-qa/ (the QA skill). Analysis written by a spawned Opus agent; all file:line citations verified against the working tree. Nothing committed — branch plan-orchestrate/ai-usage-funnel, deploys v088/v089 are live.