Topology — how the browser and the local backend fit together
megaprompt only, for now. The interactive panel is built on the single-model engine — more efficient, and far easier to change the format, the cast size, or the depth of the deliberation. The python (isolated) engine + self-election bids remain the project's batch research thesis; they're deferred here.
A local Python server (it can't live on Cloudflare Pages — it's a long-running, stateful SSE process) holds one running megaprompt context per session: it assembles the prompt, parses the model's output into the confer + the responses, and logs. When the session closes it writes the same artefacts the batch schemes emit, so viewer, index, and audit are untouched.
One human turn, step by step
Two visible layers — there's no private <think> anymore; the one model stages everything. Confer (the panel's internal back-and-forth → side panel) → respond (one message per character → main chat).
The human types (optionally picking a @persona). The server appends it to the running context as From the floor: "…" and logs a human-speak event. @addressed → the model answers as that one persona, confer skipped. Otherwise:
megaprompt · confer, then respond
The single model, holding the whole cast, writes the internal deliberation — the three characters reacting to each other, a flowing 3–4 exchange back-and-forth (no fixed rounds).
It settles a relevance-led order inside that confer (the character the turn most implicates leads), then writes one response per character — each follower answering the line actually delivered before it.
The response parser splits the output: deliberation lines → the side panel; speak lines → the main chat.
The server streams them over SSE — drawer fills with the confer, the chat fills with one message each.
Floor returns to the human. (The model can have a character pass — say nothing this turn — so it never forces a flat chorus.)
Why megaprompt for this: one model staging the confer + responses makes it trivial to change the format, add or drop a character, or dial the depth of the deliberation — all prompt edits, no routing or per-context plumbing.
Adding / removing a voice — at any turn
The user manages the cast live (sub-strip roster: ✕ retires, ⊕ add picks from the character library). Changes land at the next turn boundary, never mid-volley — and they're shaped to respect the cache.
cast changes, cache-aware
Add → append the new character's materials to the end of the system block. Everything before stays cached; you pay a one-time cache write for just the new tail (it shows on that turn's cost meter). The joiner is omniscient — the one model already holds the thread, so no catch-up beat is needed.
Retire → a simple parting line + an appended "stop voicing X" instruction. Their materials stay in place (dormant, still cheap cached reads) — deleting from the middle would shift the prefix and bust the cache. Garbage-collected only on a fresh session.
Logged as join / leave events; participants[] updates. Down to one character → no group confer (it's a 1:1). Keep ≥1.
confer / deliberation — the flowing back-and-forth among the characters, logged as first-class deliberation events: how each turn's responses were formed. Shown in the slide-out side panel, never in the main chat.
relevance-led order — the character the turn most implicates leads; the others react to the delivered line (not the backstage draft), so going first actually matters.
@-address — the override: skips the confer and answers as the one named persona.
per-turn cost — every turn records its tokens + API price as a cost event (the dear cache-write on turn 1, cheap reads after); shown left of the floor marker in the chat, totalled in the sub-strip.
local-only backend — the server is a researcher tool on localhost; only the recorded session is committed and published to the CF-Pages viewer.