OmniTutor

▸ streaming architecture · zero-spinner spec

It must never feel slow.

On-demand generation is OK. Cold start is OK. Spinners are not. Every click triggers visible engagement within 400ms — drama, owl voice, subject animation — while Haiku drafts the first beat in parallel and Sonnet streams the rest behind the scenes.

▸ The hard guarantee

From click to tutor speaking: 3 seconds. From click to first visual: 400ms. Never blank, never spinner, never blocking.

≤ 0.4sto first visual

≤ 1.5sto owl speaking

≤ 3sto beat 1 rendered

0spinners ever

1.The sequence

Four lanes run in parallel: the user, the UI (drama modal + owl), Haiku (fast first beat), Sonnet (streamed remainder). The user never sees a blank moment because the UI is already filled before the model has answered.

time

USER

student

drama + owl

HAIKU

fast · plan + B1

SONNET

stream · B2–B7

T+0s

Click. Topic chip, search, or text submit.

Drama modal opens instantly. Subject background animation starts (cascade · curve · parchment · scripts · ?-stars).

API call fired with topic + context. Haiku begins generating plan + Beat 1.

— waiting —

T+0.4s

watching the animation · subject context registers

Owl appears in the corner of the modal. Speech bubble starts typing the subject-aware greeting.

drafting…

—

T+1.0s

reads bubble · is now already engaged with the subject

Bubble fully typed. Animation continues. Owl gentle breathing.

drafting…

—

T+1.5s

owl is now narrating · audio playing

TTS audio starts. Owl mouth lip-syncs to incoming audio. Beat 1 outline visible behind drama animation.

Plan + Beat 1 returned. UI starts hydrating.

Sonnet call fired. Beats 2–7 generation starts.

T+3.0s

sees beat 1 rendered · listens to tutor

Drama modal dissolves into the lesson. Beat 1 board fully rendered. Tutor narrates "and to start, look here…"

done

Streaming… Beat 2 partial · 3 not started.

T+8s

on Beat 1 · interacting with Tutor

Beat 1 active. Right-rail plan shows beats 2–4 already populated, 5–7 still coming.

done

Beats 2–4 done. Beats 5–7 streaming.

T+15s

still on Beat 1. When ready to advance, Beat 2 is already there.

All beats hydrated. Going-further panel populated. Adjust-level available.

—

done. Cached for next session.

2.Model choice per role

The right model for the right job. Haiku is fast enough to start the conversation; Sonnet is rich enough to teach.

Role	Model	Latency budget	Why this model
First response · plan + Beat 1	Haiku 4.5	≤ 1.5s	Speed-first. Outline-quality is fine — the student is in the drama modal anyway. Get to "Tutor is speaking" as fast as possible.
Streaming beats · 2–7	Sonnet 4.6	~8–15s total · streamed	Quality matters here. Rich worked examples, derivations, intuition. Streamed in parallel while student is on Beat 1.
Hard derivations · physics & math	Opus 4.7	background	Reserved for STEM heavy lifting where mistakes are expensive. Used selectively, not for every beat.
Audio narration · TTS	ElevenLabs / OpenAI	~200ms TTFB · streamed	Voice streams as Tutor speaks. Owl mouth lip-syncs to audio amplitude. No "loading" between sentences.
In-lesson Q&A · Ask AI	Haiku 4.5	≤ 2s	Mid-lesson questions need to feel instant. Haiku is the conversational model.
Adjust-level rework	Sonnet 4.6	~3s · with shimmer	Re-plan on student feedback. The shimmer + audio-regen makes 3s feel intentional, not slow.
Visualize / Animate chips	Sonnet + canvas tool	~4–6s · drama	Triggered on demand. Drama modal again while diagram/sim is generated.

3.Invocation paths

The student can invoke a session two ways. Both flow through the same engine — different first frames.

▸ free-form text

Type or speak anything

From the search bar in discovery, or the empty Ask-AI textarea. Topic is unconstrained — could be anything in the world.

type "Bohr atom" → setup modal (pre-filled topic)
→ plan modal (Haiku + drama)
→ active runtime (Sonnet streaming)

▸ landing-page click

Click a tile

From discovery, subject pages, paths, recents. Topic is structured — we know the subject family, level hint, exam target.

click chip "Hooke's law" → setup (pre-filled · level hint)
→ plan modal (subject-aware drama)
→ active runtime (Sonnet streaming)

4.The rules

Architectural constraints we don't relax. Each rule is a guarantee to the student.

▸ Never

No spinners. Anywhere. Not in the runtime, not in modals, not in chip clicks. If we see a spinner, we've broken the contract.

▸ Never

No blank screens. Every navigation produces visible content within 400ms — drama, owl, cached preview, or skeleton with subject context.

▸ Never

No blocking single-model calls. If we're waiting on a model, the UI must already be entertaining the student.

▸ Never

No "please wait" copy. The owl is allowed to say "let me think" once. After that, it teaches.

▸ Always

Drama is on-topic. Subject-aware backdrop, owl bubble that previews the lesson. The student is being taught from second zero.

▸ Always

Beats stream behind the scenes. By the time the student finishes Beat 1, Beats 2–4 are ready. By Beat 4, all 7 are done. Zero waiting between beats.

▸ Always

TTS streams sentence-by-sentence. Owl starts speaking as the audio bytes arrive. No "audio loading" pause.

▸ Always

First-time content gets cached for everyone. The next student to ask the same question gets it instantly. Cache grows organically toward Canvas-A's hand-crafted polish.

5.How this powers the milestones

▸ M1 · "It works, and it never feels slow"

Every feature in the tutor functioning end-to-end. Both invocation paths live. Haiku-first + Sonnet-stream + drama modal is the default cold-start pattern. All content LLM-generated; no caching beyond the browser. The tutor is real — slow on absolute first hit, but the student never feels it because the drama covers the gap.

▸ M2 · "Always instant"

The most-asked content gets pre-generated and cached, Canvas-A style. Hot path: cached lessons load in <200ms — drama modal is a half-second flicker, then full lesson. Cold path: still on-demand, still <3s, still no spinners. Same architecture; cache is a performance optimization, not an experience change.

▸ Why this scales

The drama modal is a UI investment, not infrastructure. We can run on slower models, smaller GPU budgets, or in regions with high model latency — and the student still feels engaged. As infra improves, the drama modal shrinks toward instant. The student never notices the change.