What each scheme actually cost to run.

Four runs of the same scene with the same characters. Two used the lean Python coordinator path against the API directly (006a on Anthropic Sonnet 4.6, 007 on OpenRouter / DeepSeek V4 Pro). Two were orchestrated inside a Claude Code session (006b = subagent scheme, 006c = monolithic) — both ran on Opus 4.7, not the intended Sonnet 4.6, because the CC session's default model was Opus at the time. Two confounders to disentangle: the model billed and the architecture doing the orchestration.

01What we actually paid

Sources of truth: Anthropic console for 006a and the two CC runs; OpenRouter console for 007. The CC-run token counts are reconstructed from the local CC session JSONL telemetry (parent session + all spawned Task sub-agents) and priced at Anthropic Opus list rates.

Exp	Scheme	Model billed	Turns / char	Tokens (total)	Cost
006a	python	claude-sonnet-4-6	15 / 14	~2.50 M	$0.80
006b	subagent	claude-opus-4-7 *	7 / 7	26.45 M	$134.81
006c	monolithic	claude-opus-4-7 *	23 / 23	4.43 M	$22.97
007	python	deepseek/deepseek-v4-pro	9 / 8	~0.97 M	$0.79

* 006b and 006c were intended to run on claude-sonnet-4-6 but the CC session was configured for Opus when the runs executed. Each notes.md records the discrepancy.

02Normalised to Opus 4.7 — strips the model confounder, keeps architecture

Assume all four runs had been billed at Anthropic Opus 4.7 list rates. 006a scales up by 5.00× (Opus is uniformly 5.00× Sonnet across input, cache, and output buckets). 007 is re-costed against its OpenRouter token mix, assuming the same cache_control discipline lib/run_dialogue.py uses.

Exp	Scheme	Actual bill	Cost at Opus 4.7	× vs 006a
006a	python	$0.80	$4.00	1.00×
006b	subagent	$134.81	$134.81	33.70×
006c	monolithic	$22.97	$22.97	5.74×
007	python	$0.79	~$5.65	~1.41×

At a constant model, the two python-coordinator runs sit at a $4.00–$5.65 floor; the two CC-orchestrated runs sit 5.74× to 33.70× higher. The model isn't the source of the gap. The architecture is.

03Normalised to Sonnet 4.6 — what the study was supposed to cost

Same exercise at Sonnet rates instead of Opus. This is the figure that would have appeared on the bill if the CC sessions hadn't been mis-configured to Opus. Sonnet list: $3.00 input / $3.75 cache-write-5m / $6.00 cache-write-1h / $0.30 cache-read / $15.00 output, per M tokens.

Exp	Scheme	Actual bill	Cost at Sonnet 4.6	× vs 006a
006a	python	$0.80	$0.80	1.00×
006b	subagent	$134.81	$26.96	33.70×
006c	monolithic	$22.97	$4.59	5.74×
007	python	$0.79	~$1.13	~1.41×

The architecture ratios are unchanged from the Opus-normalised view (they must be — Opus is a uniform 5.00× multiplier across every bucket). What's new is the absolute floor: at Sonnet rates, even the worst-case architecture (subagent via CC) is $26.96 per scene-run. Expensive relative to the python path but not catastrophic if used selectively.

04The picture, at Sonnet rates

Log scale, because the four runs span from $0.80 to $26.96 — a 33.70× range.

The two architectural families are visually distinct on the log axis. The python coordinator runs cluster around $1.00; the CC-orchestrated runs cluster an order of magnitude above. Model choice (Sonnet vs DeepSeek vs Opus) moves cost by 1.41× to 5.00×. Architecture moves cost by 5.74× (monolithic) to 33.70× (subagent).

05Where the money went (CC runs)

Per-bucket decomposition at the actually-billed Opus rates. Both CC runs show the same shape: a small fraction is the actor work; most of the cost is the parent CC session doing tool-loop bookkeeping (reading docs, composing per-turn briefs, narrating routing decisions).

Bucket	006b tokens	006b cost	006c tokens	006c cost	Rate ($/M)
Input (fresh)	<0.01 M	$0.00	<0.01 M	$0.00	`$15.00`
Cache write — 5-min TTL	0.53 M	$9.86	0.29 M	$5.35	`$18.75`
Cache write — 1-hour TTL	1.27 M	$38.17	0.23 M	$6.78	`$30.00`
Cache read	23.97 M	$35.96	3.85 M	$5.77	`$1.50`
Output	0.68 M	$50.82	0.07 M	$5.07	`$75.00`
Total	26.45 M	$134.81	4.43 M	$22.97

006b output spend

$50.82

Opus output narrating routing decisions in the parent CC session — 63.53× the entire 006a bill on its own.

006b cache reads

23.97 Mtokens

The growing conversation context, re-read every turn. The per-turn loop compounds.

006c monolithic premium

28.71×vs 006a

Even without the routing loop, the CC session's playbook reads + parse step cost an order of magnitude over a lean coordinator.

06Practical takeaways

Scheme=python is the production architecture. $0.80–$1.13 per scene-run scales. Ten scenes is on the order of $10.00; a hundred is on the order of $100.00. The Python coordinator's cost is mostly the actor work — there's no orchestration layer to bill.
Scheme=subagent via CC is a development architecture. $26.96 at Sonnet rates per scene-run is fine for occasional runs while tuning actor system prompts or scene briefs — when the adaptive CC tool-loop is genuinely useful. Don't use it for batch experiments.
Scheme=monolithic via CC is in the middle. $4.59 at Sonnet per scene-run is plausible for ablation experiments where the cheap-and-quick "what if there were no isolation" comparison is what you want.
The model is a smaller lever than the architecture. Opus → Sonnet saves 5.00×. Python → CC orchestration costs 5.74× (monolithic) or 33.70× (subagent). If cost matters, fix the architecture first.
The cost difference is not a quality signal. A more expensive run is not automatically a better run. The audit verdicts (cross-character leakage, locked-fact compliance, voice fidelity) live separately in each experiment's audit.md.

Methodology — how these figures were obtained

006a and 007: token counts and costs are pulled from the API providers' own console logs (Anthropic console for 006a, OpenRouter console for 007). Bills reported by the user.

006b and 006c: the Anthropic console has no record of these runs — they were executed by a Claude Code session, which records every API call's usage object (input_tokens, cache_creation, cache_read, output_tokens) in the session's local JSONL transcript at ~/.claude/projects/<project>/<session>.jsonl for parent calls and ~/.claude/projects/<project>/<session>/subagents/agent-*.jsonl for spawned Task sub-agents. Totals are summed across the parent + all sub-agents for each experiment session. Costs apply Anthropic Opus 4.7 list rates.

Normalisation: Opus normalisation multiplies a Sonnet bill by 5.00× (Anthropic prices Opus at uniformly 5.00× Sonnet across input, cache, and output buckets). Sonnet normalisation divides an Opus bill by 5.00×. 007's normalisation re-costs its OpenRouter token mix (~0.93 M input, ~0.04 M output) against Anthropic list rates assuming lib/run_dialogue.py-style cache_control on ~88% of input. The mid-point estimates ($5.65 at Opus, $1.13 at Sonnet) sit inside a range that reflects how much of the input would actually have been cache-eligible.