# FC Runtime — Video-Harvested, Layered, Interruptible Expression Player
**Cal, 6/6. The implementation design for the video-frame pivot. Jun + Hermes produce the emotion videos; this is how the FC PLAYS them so she feels alive + responsive, not locked into canned clips.**

## The problem this solves (Jun's question)
Playing a whole emotion video = a fixed, non-interruptible timeline. If something falls while she's 8s into a smile clip, she can't react. Canned playback breaks the "alive" illusion. The fix: don't PLAY clips, HARVEST them into short, layered, interruptible units driven by a real-time controller.

## Core principle
The VIDEO gives natural, identity-locked motion (the content the warp couldn't fake). A rig-like RUNTIME (blend layers + priority interrupts) gives responsiveness. Best of both. The rig thinking was always the runtime; the video solves the content.

## 1. Harvest spec (what we curate from the emotion videos)
From each emotion render, extract (NOT the whole clip):
- **Key poses** — single frames at meaningful beats: `neutral`, onset-quarter, peak, settle. (The morph "keys" idea, now sourced from real video.)
- **Short transition snippets** — the natural ONSET (neutral→peak, ~0.3-1.0s) and OFFSET (peak→neutral). These carry the real muscle dynamics + onset order ([[expression_onset_reference]]).
- **Idle loops** — short, seamless loops of micro-motion (breath, blink, gaze drift) harvested from calm stretches.
**Curation rules:** trim to the frames where the onset reads right (corners-then-eyes for a smile, etc.); align/stabilize so HEAD POSE is consistent across a swappable set (else the avatar bobs when it switches states) — pick a stable-pose stretch or register frames to a reference. Naming: `<emotion>_<beat>.png` for poses, `<emotion>_onset.webm`/`_offset.webm` for snippets.

## 2. Runtime architecture — LAYERS (the key to responsiveness)
Decompose the face into independently-driven layers so a reaction never has to wait for an unrelated channel:
- **Base / idle layer** (always on, looping, interruptible): breath, blink, micro-gaze. She's never frozen.
- **Expression layer** (event-driven): plays short onset → hold → offset snippets per `[PERFORM:]`.
- **(Phase 2) Micro-channels**: split further into eyes/brows / mouth / head-pose so a startle can fire eyes+brows INSTANTLY while the mouth finishes a word. Additive, like game-engine animation layers / Live2D params.

## 3. Event + interrupt model
- Input: `[PERFORM: emotion intensity]` from malin.py (already built) + ambient events (new chat input, vision events later).
- **Every event carries a PRIORITY.** A higher-priority event INTERRUPTS the current expression: crossfade from the CURRENT frame to the new target's onset — never wait for the clip to finish.
- "Something falls" → a high-priority `startle` interrupts whatever's playing via a fast (~120ms) crossfade to the surprise onset, then resolves.
- Low-priority / same-emotion events just retarget smoothly. Idle resumes when nothing's active.

## 4. Integration with the existing FC
- Keep the current transport: the controller writes/reads `fc_state.json` (the floating window already consumes it) OR the :1238 event client. Add fields: `active_expression`, `intensity`, `priority`, `interrupt`, `frame_cursor`.
- malin.py's performance router already emits `[PERFORM:]` — extend it to set priority (startle/reaction > conversational expression > idle).
- The window/renderer (Hermes) swaps to: play snippet frames + crossfade-on-interrupt, instead of warp/inpaint compositing.

## 5. Walkthrough — the "something falls" case
Idle loop playing → Jun says something funny → `[PERFORM: amused 0.6]` (priority: conversational) → amused onset plays → **mid-hold, a loud noise event fires `[PERFORM: startle 0.9]` (priority: reaction)** → controller crossfades from the current amused frame to the startle onset in ~120ms → startle resolves → returns to idle (or back to conversational mood). She reacted in real time, mid-expression. That's the win.

## 6. Limitations (honest)
- Crossfading mid-motion is slightly less smooth than a purpose-rendered transition — small quality cost for big responsiveness. Worth it.
- Novel COMBINATIONS (smile WHILE surprised) need either a pre-rendered combo clip or the micro-channel layering (Phase 2) — layers scale better than a combinatorial clip library.
- It's video-sprite animation, not a true 3D rig — less flexible for arbitrary novel poses, but far more natural-looking. That's the trade we're choosing.
- Storage/curation: a clip+pose library per emotion. Keep the set small (core emotions) first.

## 7. Build phases (dumb version first, per [[feedback_dont_oversplit_dumb_version_first]])
- **P0**: idle loop + ONE interruptible expression (neutral↔smile onset/offset) playing from harvested frames, crossfade-on-interrupt. Proves responsiveness.
- **P1**: a small emotion set (smile, surprise, amused, thoughtful) as onset/hold/offset snippets + priority interrupts.
- **P2**: micro-channel layering (eyes/brows/mouth/head independent) for additive reactions + combos.
- **P3**: wire ambient events (vision/audio) as interrupt sources.

## Division of labor
Cal = this runtime design + the harvest/curation/alignment tooling + the controller logic. Jun + Hermes = the emotion video renders + video analysis (cost-routed to flat-rate). See [[project_malin_avatar]].
