# Malin — Model-Swap Media Render Pipeline (spec)

**PROBLEM (Jun caught it):** the 5090 is maxed (~511MB free) with Dolphin-24B (19GB) + Qwen3-VL (6GB) + Chatterbox all pinned. The VRAM-guard queue (selfie waits for ≥6GB free, video ≥14GB) NEVER drains — there's never that much free. So backlogged [SELFIE:]/[VIDEO:] jobs never fire; they pile up forever.

**FIX:** free VRAM on demand by temporarily UNLOADING a model, render, reload. The guard changes from "wait for free VRAM" (never comes) to "make room → render → restore."

## Flow
When a queued media job's turn comes (PREFER an idle window — not mid-reply):
1. Check free VRAM. If already enough, render directly.
2. Else SWAP to free the needed amount (via LM Studio load/unload API):
   - **SELFIE (~6GB):** unload the VISION model (Qwen3-VL, 6GB) — lowest impact, vision is only used when Jun cues a look. → render selfie via ComfyUI → reload Qwen3-VL.
   - **VIDEO (~14GB):** unload Qwen3-VL (6GB) AND Dolphin-24B (brain, 19GB) → render the video (~1 min) → reload both.
3. Show a STATUS while swapped: "taking your photo, back in a sec" — so Jun knows why she's briefly unresponsive, and she isn't expected to see/think during it.
4. After render: reload the unloaded model(s) (warmup ~seconds for vision, ~6s for brain), then resume. Media lands in Jun's Malin Telegram DM.
5. Process the queue one job at a time this way.

## Honest tradeoff
During a render the unloaded sense is OFFLINE: a SELFIE = she can't SEE for a few seconds; a VIDEO = she can't see OR think for ~a minute (brain unloaded). That's the real cost of rendering on a maxed GPU — Jun accepted the delay/queue tradeoff. Minimize disruption: render during idle windows + show the "back in a sec" status.

## Coupling / safety
- Use LM Studio's load/unload API for the swaps.
- Gate the swap on live_loop being IDLE (not mid-reply) — coordinate with the readiness supervisor/heartbeat.
- **Failure mode to guard:** if a RELOAD fails, she's left missing a sense (or her brain) until restart. The readiness SUPERVISOR must detect a model that failed to reload and recover it. This is WHY the supervisor is built first.
- Keep it reversible + toggleable.

## Sequencing
3rd + heaviest build, AFTER: (1) readiness supervisor + startup, (2) green eye-glow. Build it only once those two are proven — the supervisor is the safety net for a failed reload. Test carefully.
