OmniTutor
▸ build roadmap · 10 phases · M1 / M2 / M3
Build it in ten steps. Then productize. Then scale.
10 atomic phases, each shippable, each tested. Land all 10 and you have M1: a real tutor that always feels fast. Then comes M2 (productization) and M3 (scale & differentiation).
Control census · what we owe the user
Every interactive surface across the runtime + modals + discovery. None decorative. By P10, every one functional.
| Region | Count | What |
| Top bar | 16 | Brand · Lesson ▾ · Save&exit · 4 mode segments · Level ▾ · 3 verbosity · Make me ▾ + 6 items · ⚙ |
| Lesson dropdown | 7 | + Start new · 6 row types (now / same chapter / recent) |
| Beat strip | 4 | Beat-number nav · Play · Pause · Stop |
| Whiteboard toolbar | 11 | Pen · Highlight · Erase · Text · 5 colors · Undo · Clear · Fullscreen · 3 board tabs |
| Ask area | 5 | Textarea · Ask AI · 🎙 · 📎 · 📷 |
| I-want-to chips | 6 | Mode-aware · Explain · Example · Visualize · Animate · Quiz/Stuck/Skip/Look · Got it/Advance/Done |
| Coach board | 6 | Answer input · Submit · Hint · Show me · Why · Skip step |
| Test board | 3 | Options A–D · Submit · Skip |
| Adjust modal | 17 | X · 5-step direction · 9 axes · textarea · 🎙 · Cancel · Apply |
| Setup modal | 22 | X · 16 segments · textarea · advanced toggle + advanced inputs · skip · Plan |
| Plan modal | 13 | Back · audio play · 7 iterate chips · textarea · 🎙 · Send/Lock |
| Right rail | 7 | Mini owl · 2 tabs · plan-tree node toggles · going-further cards · refresh · status |
| Bottom actions | 2 | Adjust level · Start new lesson |
| Discovery | ~30 | Search · 🎙 · 4 lanes of tiles · recents · paths · sign-in · brand · pricing |
| Total unique controls | ~150 | All real by P10. None stubbed. |
The ten phases
Each phase is atomic — testable on its own, ships a real working slice. Roughly 1 week each.
P1Foundationheavy
▸ scopeRepo skeleton · auth gate · design tokens (four-blue palette) · owl SVG component · KaTeX · deploy pipeline · logging · API key vault. No user-facing UI yet.
▸ design refdesign_language.html tokens · owl_brand.html SVG variants
▸ testHealthcheck endpoint live · owl SVG renders on a test page · hello-world Haiku call returns text
P2Discovery + 4 subject pageslight
▸ scopeomnitutor.ai static landing · 4 lanes · search bar (text input only) · 4 subject pages (Physics · Math · History · English) — all clickable, all navigate
▸ design refdiscovery.html + 4 subject_*.html
▸ testClick every tile → right subject page · click any item inside subject → routes to /setup?topic=X
P3Setup modal · info gatheringheavy
▸ scope4 questions · free text · advanced (pace · tone · outputs · textbook · exclude) · skip path · validates and persists locally · hands intent object to /plan
▸ design reflesson_runtime_v12.html#setup
▸ testAll 22 controls click correctly · "Plan my session" hands a complete intent · skip uses defaults
P4Plan modal + drama infrastructureheavy
▸ scope5 drama modals as reusable component · Haiku-first call wired · plan renders · audio mock · iterate chips regenerate plan in-place with shimmer · Send/Lock fires veil → routes to /lesson
▸ design refdrama_modals.html + lesson_runtime_v12.html#plan
▸ testT+0.4s drama renders · T+1.5s Haiku response back · iterate chips work · Lock triggers veil · zero spinners
P5Streaming pipeline · backendheavy
▸ scopeBeat 1 from Haiku · Beats 2–7 streamed from Sonnet in parallel · partial-render protocol · cache write-through · failure modes (retry · fallback · friendly degraded)
▸ design refstreaming_architecture.html
▸ testLatency budget (<3s to Beat 1) · streamed beats hydrate plan tree as they arrive · cache hit on second request <200ms
P6Active runtime shell + Teach modesheavy
▸ scopeTopbar · right rail · ask area · beat strip · playback widget · Teach-Concept board · Teach-Derivation board with verbosity toggle · ALL top-bar controls functional
▸ design reflesson_runtime_v12.html#active Teach scenarios
▸ testBeat strip navigates · playback plays/pauses TTS · verbosity toggle changes density · all topbar items real
P7Coach + Test modesmedium
▸ scopeCoach: step input · submit · hint · show me · why · skip · all evaluating real answers · Step-counter prominent header · Test: MCQ · timer · submit · skip · score
▸ design refCoach + Test scenarios in runtime
▸ testWrong answers get hints · "Show me" walks through · timer ticks · grading works · all 9 controls real
P8Board mode + Visualize/Animatemedium
▸ scopeWhiteboard with all 11 tools · canvas drawing · Tutor watches & comments · Visualize chip generates SVG diagram on demand · Animate chip generates simulation · drama covers gen latency
▸ design refWhiteboard scenario + animated chip concept
▸ testDraw → Tutor responds · Visualize produces relevant diagram · Animate produces usable mini-sim
P9Adjust level + cross-cutting controlsheavy
▸ scopeAdjust modal fully functional · Level picker · Make me dropdown (all 6 outputs really generate) · Save & exit · Start new lesson · settings · status pulse · plan-tree expand · going-further refresh · iWant chips on every mode
▸ design refAdjust modal + topbar + right rail
▸ testPlaywright sweep on ~50 cross-cutting controls · "Make me flashcards" actually produces flashcards · adjust really changes level
P10Audio · lip-sync · cache · polishheavy
▸ scopeElevenLabs TTS streaming · owl mouth lip-syncs to amplitude · caching layer on (Canvas-A pattern · trending pre-warmed) · mobile-responsive sweep · accessibility (keyboard · ARIA) · final latency audit
▸ design refowl_brand.html lip-sync variants + streaming spec rules
▸ testCold-start ≤3s confirmed · cached <200ms · mobile passes · keyboard-only flow works · zero spinners under WebPageTest
Three guardrails
Apply to every phase. No relaxation.
▸ no stubsNo phase ships if any control on its surface is non-functional. "Skip step" stub is a fail. We complete what we touch.
▸ no spinnersNo phase ships with a spinner. If a thing is slow, drama covers it. Latency budget is part of acceptance.
▸ testedEach phase ends with an automated Playwright sweep hitting every control on its surface plus a manual checklist signed off.
The three milestones
M1 makes everything else worth doing. M2 opens the gates. M3 makes it scale & sing.
▸ M1 · ~10 weeks · the demo
It works · and never feels slow
All 10 phases shipped. Real tutor, real models, real audio. Demo-ready to a private cohort.
▸ in M1
- All ~150 controls functional · zero stubs
- Both invocation paths work (free-form text + landing tile)
- Cold start ≤3s to Beat 1 · zero spinners · drama covers latency
- Hot start (cached) <200ms
- Audio TTS streams · owl lip-syncs to amplitude
- 5 modes work: Teach-Concept · Teach-Derivation · Coach · Test · Whiteboard
- Adjust level · Make me · Visualize/Animate chips · all real
- Mobile-responsive · keyboard-accessible · ARIA basics
- 4 subject pages curated (Physics · Math · History · English)
- Anything else generated on-demand
▸ NOT in M1 (deferred)
- Accounts (Google sign-in · profiles · persistent library)
- Mobile native apps (web-PWA only)
- Real Rive owl with viseme lip-sync (placeholder SVG)
- Multi-user · classroom · teacher dashboard
- Payments · subscriptions
- Social (sharing · friend activity · peer paths)
- Spaced repetition · long-term memory · teach-back
- Full hands-free voice conversation mode
- Offline mode
- Pre-warmed cache at scale (catalog not pre-generated)
- Subjects beyond the 4 (others on-demand only)
verdict~70% of "OmniTutor the company" done. Real, fast, shippable to a small private cohort. Can show investors, demo to teachers, validate the experience.
▸ M2 · ~6 weeks · productize
Open the gates
Make M1 safe to release publicly. Add the systems that turn a demo into a product.
▸ in M2
- Accounts · Google + email sign-in · profiles · persistent library
- Personal data layer · session history · progress tracking
- Payments · Stripe · free tier + paid plans · subscription mgmt
- Content safety · moderation · age-appropriate filtering
- Observability · cost dashboard · latency dashboard · error tracking
- Pre-warmed cache for top 100 trending lessons
- Email · transactional + onboarding sequences
- Privacy & legal · ToS · privacy policy · GDPR · COPPA-aware
- Better mobile-web (responsive sweep beyond M1's basics)
- Public landing page (marketing) separate from app
▸ NOT in M2
- Native iOS / Android apps
- Real Rive owl (still placeholder)
- Classroom / teacher tools
- Social features
- Catalog beyond on-demand
- Full voice-mode conversation
verdictPublic-launchable. Real users · real billing · real data. ~85% of company done.
▸ M3 · ~10 weeks · scale & sing
What makes it different
The features that make OmniTutor un-replicable. Native apps. The full owl. Classroom. Memory.
▸ in M3
- Real Rive owl with full viseme lip-sync · animated illustration · multiple expressions
- Native iOS & Android apps (SwiftUI + Compose) sharing the streaming engine
- Full hands-free voice conversation mode
- Spaced repetition · long-term memory · "what did we learn last week?"
- Curated catalog scaled to ~500 subjects · pre-generated polished lessons
- Classroom · teacher dashboard · multi-student management · assignments
- Social · share lessons · peer paths · friend activity feed
- Offline mode (PWA + native)
- API for partners (textbook publishers · schools · LMS integration)
- Internationalization (UI in 10+ languages)
▸ Beyond M3
- Hardware (custom learning device?)
- VR / AR experiences
- Direct school B2B sales motion
- Paid creator economy on top of OmniTutor
verdictDefensible · differentiated · scalable. The product Mukesh has been describing — fully formed, replicable only by re-living the design journey.