Design notes

Design notes and current-status records for the live room — a chat where real people talk with a cast of AI personas, voiced by one model. Four sections: what it does (Product), how it's built (Architecture), how its off-spec output is kept honest (Harness), and how it looks (Front end) — plus a legacy shelf of superseded notes and frozen design specimens. Kept in step with the latest committed prototype.

01 Product

product · user POV

Current functions

Everything the live room does today, from a user's seat — rooms and cast, the confer-then-speak panel, the furniture for play, and the lifecycle every instrument shares.

see what it does →

design · the Notes-tab read

看见 / Seen

An on-demand read that lands in your Notes: a batch of chats distilled into something you'd keep, signed by a voice from your own room. Live in the app; this is the original study.

open the mockup →

design · the Character vividness dial

Room dials

Now one creation-time control: Character vividness — how vividly each persona plays itself (True to life → Larger than life, 3 notches), baked into the cached prefix. The old Temperament (panel-debate) dial was retired: conflict is the floor producer's call now — read from the room, steered in plain language, with Cues to teach it. Live in the new-chat modal.

read the note →

plan · what's next

Roadmap

The agreed-direction features from idea.md, each with the reasoning kept until it is built. Floor producer and dispatches shipped; listen mode still open.

read the roadmap →

strategy · the Pincus lens

Proven · Better · New

The prototype audited through Pincus's Proven · Better · New framework. Thirteen proven chat mechanics named — 8 shipped, 5 open — each with a phone-frame mockup.

read the audit →

design · steering, surfaced

Cues

Suggestion chips that teach you to steer the panel in plain language —「have them disagree」— then fade as you learn. The user-facing surface of the floor producer. A concept note.

read the note →

contents · 15 rows + an archive · the toolbox track

The toolbox — contents

Everything on the room's instruments behind one door, ordered by what you are about to do: the rules · the build · the interface a person touches · the interface the AI touches · what was measured. Curated to fifteen; the pages the track outgrew are still one click away.

open the contents →

built · one rail, four surfaces

Notifications

Every event writes a durable row, mirrored live as a toast, a dot on ≡, and one mixed-kind inbox. Three kinds, one compact form — and「build done」arrives as a message from the persona herself.

read the design + build notes →

design · the v1 instrument

Feedback widget

How we measure the floor producer with no user base and PII-heavy conversations: collect the signal in-product — a per-turn thumb + lever-mapped tags (too long · too soft · no real debate · wrong person) + a note, plus a ~20-turn check-in survey. Runs under three blind arms (no-FP / v0 / v1), and logs each tag with the staging that produced it (the seed of a trained policy) — while shipping real value to the user. Shipped and live; the first piece of v1.

open the mockup →

design · the new-chat window

Smart panel selector

Shipped 2026-06-18 — kept as the interactive mockup of roadmap §5: describe your situation and the room curates a panel you can edit, instead of browsing the whole library. Removable rows with the reason on the right, three panels a tap apart, hand-pick reusing today's library dialog, and the dials folded behind Advanced. Phone-first, themed, with the project's real persona roster.

open the mockup →

built · the persona factory

Studio

Building a persona as an app function: name a figure, confirm once, and the machine runs unattended — gather · curate · author · two cold auditors — landing it in your Studio. Live at /builder.

read the design →

built · permission & lifecycle

Persona visibility

Settled + built 2026-07-13 (v396) — the one model for who sees and uses every persona: 2 axes × 2 values (private | public · alive | deleted) + immutable created_by, five verbs (build · Delete/Restore · clone · promote · demote), one predicate (may_discover) every discovery surface calls, and one invariant — an admin never raises the visibility of someone else's persona; promotion crosses ownership by copy (clone → test → promote in place, identity kept). Shipped: the smart-suggester leak closed (it read the library unfiltered — and so did the Vibes brew), promote ownership-guarded, take-off repo-only, Delete-requires-private + Restore with a Studio Deleted section, the persona-page room grant, admins keep demoted personas in their picker (the staging shelf). Norms pinned: seated = shared, no true deletion + the erasure carve-out, room-level moderation. Clone shipped too (v397) — "Clone to my Studio" in the console: fresh slug + ULID, born private, lineage credit on the card; publishing another user's work is now clone → test → promote, exactly as designed. v398 closed the last gaps: /api/config strictly filtered (no hidden cards in a non-admin payload — the devtools soft leak), seated personas ride the room's own /state cards (the grant made concrete), and clones get a credit anonymize (console "Remove credit" — the dissociation lever). v399, from a live hand-test hole: the seating door — a member could start a fresh private chat with a granted private persona from its profile page; now every seating endpoint (create · invite · moment-join) requires each slug be discoverable by the seater, and the granted page's chat door greys out with "This is a private persona" (v400) — an explained lock, not a vanished button. Nothing from the spec remains open.

read the spec →

design · the characters' feed

Vibes content engine

How the cast's 朋友圈 feed gets its posts with no hand-authoring: the model writes, the system curates. Taste lives in the prompt; memory, distribution and verification live in code.

read the design →

02 Architecture

infra · the China path

Deployment topology

How traffic reaches the box from mainland China — grey-cloud DNS to a Singapore ECS instance, Caddy and TLS on the box, and the outbound path to the API. Three flows.

open the diagram →

map · the built system

Function map

How the running system is wired: the three tiers, a message's journey from send to reply, the two channels that never cross, and which file owns what.

open the map →

specimen · one real turn

Megaprompt anatomy

One real panel reply's payload dissected byte by byte, colour-coded by who wrote each segment. The punchline: the humans typed 2.7% of what the model reads.

open the dissection →

mechanic · prompt cache

Prompt caching

How the room reuses the model's cached prefix: built front-to-back, what keeps vs busts it when a voice is added or retired mid-room, and the §9 compaction plan. The mechanic — its economics live in Cache TTL.

read the mechanic →

economics · the TTL choice

Cache TTL economics

A measured ledger: on a deliberative human room the 1-hour TTL default runs ~58% cheaper than the 5-minute window ($1.78 → $0.75). Why 1h is the default, and where the boundary flips.

read the economics →

process · localhost → box

Dev loop

How a change ships: built on localhost:8011 → I self-test (simulate real users; DeepSeek cheap, Sonnet sparingly) → you test → push to the SG box. The 8011 dev-data norm, the service-worker gotcha, the smoketest, and the batch-deploy rule.

read the loop →

03 Harness

build · the parse layer

Harness

One model call confers, speaks and hands over documents at once — so the ways it arrives off-spec multiply. The chain that catches them, and the surprise live traffic sprang: the fallback path is the main path.

read the writeup →

exam · the routine one

The exam

The toolbox battery as one command with a score — 41 scenarios, both turn paths, checks over the event ledger and never the transcript. The finding that decided the ship: the split buys reach and spends restraint.

read the run →

method · read before the next game

Building a cartridge

The boundary tests, the loop, and every law with the room that paid for it. A directive can shape how something is said; it cannot stop something being said — when a prohibition keeps losing, build a door.

read the method →

design + build · the games

The game device

The GM holds a party-game app and hosts through it. The server deals, counts the ballot, checks the end and opens the pile; the persona is asked only for moments — facts and obligations, never wording. D1–D4 live: the schema, the runner, and four cartridges on the shelf — 卧底 · 二十问 · 真心话大冒险 · 大话骰.

read the design + the build →

specimen · the hand-prompted game

The dice specimen

大话骰, built live with the owner in one day, recorded whole: the three-engine split（the world is code, the voice is a mind — and the mind is two tiers）, the collect phase, the strip, the flash voice with a persona capsule, the arithmetic verdict, and every law twelve audited games paid for.

read the specimen →

exam · sim → improve → sim

The sim loop

Four simulated users through the real app, one game at a time: authoritative rules → the kit rewritten for its GM reader → the troupe plays → two-lens analysis → fix at the right layer → loop until clean. The plan and the scoreboard.

review the plan →

exam · red-team + functional run

QA test run

A live pass on real rooms — the human×persona cardinality matrix, artifacts, and a full red-team of the floor producer. Headline: Flash was prompt-injectable, now closed.

read the run →

exam · the flatness diagnosis

Conversation quality

The first big production exam: 80 rooms pulled, 39 scored. The panel is too long, too agreeable, and the extreme dial delivers a third of the conflict it promises — a staging problem, not a voice one.

read the diagnosis →

design + exam · roadmap §1

Floor producer

The per-turn director that sets staging — who speaks, how long, how hard to push — never the content. Shipped to production, with the blind-judged quality study folded in.

read the design + study →

built · v2 "manners" · 2026-07-15

Seven steerings

Seven behaviour rules, all about how a turn ends: three bug fixes, two insurance, one new capability, one parked. The six now ship as the v2「manners」variant.

read the verdict →

exam · the four-dimension audit

Systemic check (v349)

One pass over the whole prototype — the routine safety net (smoketest PASS), then the four asked-for dimensions. Performance: the one slow path is the cold first open (1.76 s of shell wire from China) plus two disk-scanning endpoints and three scaling time-bombs. Language: 8 of 10 core concepts carry two-plus names — the new-chat picker says character · cast · panel in a single flow. UI reuse: ~8 dialog scaffolds, ~11 close-× treatments (top 10–24 px / right 6–28 px), the same button under 3 names in 3 files — because there are no shared tokens. Formats: 4 relative-time formatters, faux-bold 700, dark-mode bypasses. All of it distilled into 38 numbered fix points — tap rows to pick, the tray collects serials, then say “fix #2 #7 #22”. Outcome: all 38 fixed and deployed (v350–v374 — the terminology canon, the theme.css/fmt.js shared layer, the perf batch, one format voice).

the record →

plan · the language layer

Room language

Every room has a target language — detected from the Smart-selector's Describe box, or set in more settings. The goal: every persona speaks it, while staying themselves. Measured: 90% overall, but the misses are concentrated — strongly language-anchored figures (Confucius) and hard targets (Japanese). The plan, in two layers: stop the prompt contradicting itself (separate voice from language, drop the schema's "their own language"), then a translate-on-divergence net — a slipped line shown original ─ hairline ─ translation, persisted with the message (saveable to Notes, with go-to-line), while the model's memory keeps the target. Layers A + B shipped & deployed to production (Japanese opening 1/4→4/4; Confucius speaks French; the translate net persists + is saveable with go-to-line).

read the plan →

04 Front end

reference · the app with the behaviour removed

Skeleton

The running app stripped to UI only — and it opens in a browser. Every colour, radius and font is the real one, because it is the real one; only the behaviour is gone.

open the skeleton →

reference · names and reasons

UI Kit

The companion index: every module named, with the one thing about it that isn't obvious from looking — why one glyph stands for a whole group in the chat list, why the emoji panel is a keyboard-space occupant, why your own bubble has no colour bar. Each card carries the app's real selectors (grep them) and deep-links into that surface in the skeleton. It deliberately draws nothing: the first version hand-reproduced components and got three of them wrong, so the drawing moved to the generated file and this page kept only what a generator can't produce.

open the UI kit →

post-mortem · how we work

Lessons

What the gap 10 build actually taught, in eleven lessons. The expensive parts were never the code: a test that skips the mechanism under test passes however broken that mechanism is; a maintained list is the wrong shape for a universal rule (it failed three separate times); a registered layer you cannot see is worse than an unregistered one. Plus when to stop — the back-gesture slide turned out to be the browser's, unfixable, and proving that was cheaper than a fourth attempt. Grouped by cause: only one of twelve versions was a feature.

read the lessons →

reference · names + live design

Style Guide

Every part of the running chat, named and shown in its current form, in all three themes. A living reference, kept in sync on every ship.

open the style guide →

design · the tool icon set

Room-tool icons

A house-drawn SVG for each of the room's seven tools, built from shades of one house accent — never a pasted emoji. Shown as a grid, a sheet, a chip row and the real bubbles.

open the icon set →

design · toolbox T1 · shipped v652

The timer card

The clock was the last tool whose outcome arrived as a capsule. A move is a capsule, an outcome is a card — so the ring becomes one (setting a timer still does not), built from the shared tool-bubble parts and nothing else: band · setter · watermark · the digits. The three forks it turned on, both sides rendered in both themes — does the ring want the eye, whose name goes in the band, and what a timer with no label should say.

open the design →

mock · the admin console

Console redesign

The admin console's junk-drawer System page regrouped into three single-purpose tabs — Status · Models · Settings. An interactive mock; the live console is unchanged. A proposal.

open the mockup →

post-mortem · mobile composer

The virtual keyboard

The nine-round war (v469–v477) over the on-screen keyboard: bubbles covered, an iOS gap, the page-shove flash. Three separate root causes, one mental model — Android resizes, iOS pans — and the final architecture: one viewport-meta key for Android, and on iOS a measured pre-shrink + programmatic-focus tap routing so Safari never has a reason to pan. The seven traps that each cost a round, and the #kbdebug regression recipe.

read the post-mortem →

post-mortem · selectable text

Text selection on touch

The ten-round build (v490–v499) of the double-tap text sheet — WeChat's 双击 → full-screen selectable text to Save or Copy. Why it's a dark corner: the OS's own selection UI (Android's handles, iOS's callout, Chrome's search bar) can't be styled or hidden while a native selection exists — so the only fix is to give it up entirely (user-select:none) and rebuild highlight, handles, word-select and hit-testing yourself. Plus the iOS killer: caretRangeFromPoint returns null under user-select:none, so you hit-test by measurement. The device is the only truth; the emulator lies about touch.

read the post-mortem →

record · gap 10 · shipped v516

The dismissal sweep

Where every layer's close sits, and what the back gesture does to it — one card per dismissible surface, classified into the four types a phone already teaches: a page goes back top-left, a task dialog cancels left and commits right, only overlays close top-right, sheets swipe down. Derived from registerBackLayers() + the shared .dialog-x scaffold: 4 moved · 8 stamped · 23 already compliant, all opt-in and phone-only so desktop came out unchanged. Plus the gesture half — the iOS horizontal reveal traced to one under-scoped CSS rule of our own, one Esc handler across 24 uncovered layers, and a back-gesture dead zone in the Studio iframe under a destructive confirm (recorded, deferred). The standing record of which dialog is which type.

read the record →

research · the chat-app bar

Rendering & speed

To our users we're a chat app, judged against ChatGPT · Gemini · Claude. How the best apps render output and feel fast, and a grounded plan for ours: the four-stage pipeline (markdown · KaTeX · syntax-highlight · sanitize), the $ math-vs-money fix, never exposing <speak>, and the one big speed gap — streaming (TTFT). Steps 1–2 shipped to production; gap matrix + roadmap inside.

read the note →

study · speed on a slow link

Perceived speed & the slow-network playbook

Why the Singapore box felt slow — measured: it's the wire, not the server (a 1.5 KB file takes 1.4 s). What WeChat (Mars · mmtls · smart heartbeat) and WhatsApp (local store · pending-queue · ✓✓) do on bad networks, then a scorecard of our app against it. Shipped to production: one-round-trip /api/boot, persistent per-user cache, optimistic everything, live-turn recache, access-loss eviction, a durable send outbox (retry + “!”), delta sync. What's left: the ~1.4 s handshake (HTTP/3 / edge). Merges the old Perceived speed + Slow-network playbook; three SVGs inside.

read the study →

research · messenger reflexes

UX benchmark

We look like a messenger, so users arrive with messenger reflexes. The live room held against WhatsApp, WeChat, and Telegram across style · interaction · function · the title-menu hub — faithful four-app redraws, a borrow / refuse filter, and a ranked, impact×effort set of borrowings that fit a room full of AI personas (long-press menu, quote-to-panel, search, 置顶/免打扰, Telegram's title-menu hub). Plus what we deliberately won't take.

open the benchmark →

design · seat colour

Persona colour

Colour is a per-room seat, not a persona's identity — a rotated OKLCH palette so a cast always reads distinct, and the same figure can wear different colours in different rooms. Grey is system. Live since v88.

read the design →

design · voice input

Push-to-talk

Voice input for the composer: six patterns weighed, the live-dictation composer shipped. The alternatives are kept as the record.

read the study →

research · diagrams

LLM whiteboards

How a persona hands over a drawing: Mermaid vs Excalidraw vs tldraw. Mermaid shipped as a bubble renderer (→ the wildcard pane); a shared canvas stays a parked phase-2 idea.

read the note →

05 Legacy & specimens

Kept for provenance — frozen design specimens whose pick has shipped, and pages superseded by the current docs above. Not maintained; each carries an in-page banner.

Specimens — explorations whose decision shipped

Composer studies — 13 landing-composer takes; shipped “Coral flow”.
Convene-button studies — the “arrow → 3 heads” glyph shipped.
Wide-mode composer studies — resolved: the muted call-pill shipped.
Link-unfurl mock — a working prototype; not yet built into the room.
Chat-list avatar study — four treatments for the row's leading tile; the plain figure + linear disc-blend shipped v531–v532.

Superseded — replaced by current docs

The build plan — phases A/B/C all shipped; now build history. → Function map.
Architecture (old) — pre-streaming, per-room cast “planned”, exp-010. → Function map.
Repo sync — the retired Mac-mini / branch model. → Dev loop.
DECIDE gate — a response-gate design on the frozen exp-010 study.
Narrow-scrollbar specimen — the call was made; #10–15 now hidden. → Style Guide.

Resolved decisions

reply model: Busy-state batching — idle → answer now; busy → queue, then drain the whole queue in one turn. Caps API calls at the panel's turn rate, not the human typing rate.
signup gate: Invite code at registration + a per-user daily cap. Public URL + paid API = open wallet without it.
transport: SSE-native — instant message delivery + an "X is typing…" indicator. Presence is RAM-only; it never touches state.json.
hosting: Singapore (Ali ECS), domain xbbapp.com, grey-cloud DNS direct to the box, Caddy for TLS. Hong Kong is blocked by Anthropic.
seed-admin: On migration, existing rooms are assigned to you; the second tester joins by invite.