# Malin WEBCAM VISION — Phase 2  (GATED: only after the live loop is verified working)

GOAL: Malin can SEE through the 5090's webcam and describe what she sees.
Jun's verification: point the camera at something, ask her, she describes it accurately.

## Contract
A `look()` capability: capture the current webcam frame -> a LOCAL vision model -> return
a text description. Malin invokes it (she has tool-use, Phase 5) when sight is relevant,
or the loop calls it when Jun asks her to look.

## Build
1. **Capture**: `cv2.VideoCapture(0)` -> grab one frame -> encode. Handle camera
   busy/absent gracefully (return "I can't see right now" rather than crashing).
2. **Describe**: a LOCAL VLM on the 5090. Default suggestion: **moondream2** (small, fast,
   built for "describe what you see") — or whatever vision model is already on the box
   (llava / qwen2.5-VL). Your call based on what's installed; CONFER if nothing suitable
   is present. Keep it FAST — this is interactive.
3. **Expose as a tool** Malin can call: `look() -> description string`, wired into her
   existing tool-use. Optional direct trigger: if Jun says "what do you see?", she calls look().
4. **In the live loop**: when she calls look(), the description enters her context so her
   spoken reply references it (and the FC reacts as she says it).

## Verify (Jun's test)
Point the webcam at an object/scene, ask "what do you see?" through the live mic -> she
describes it accurately, spoken through the speakers, FC reacting.

## Rules
- **GATED**: do this ONLY after the live loop (ears + brain + speakers + FC) is verified
  working. If the loop isn't up yet, finish that first — it's the priority.
- Bounded + reversible. Confer before swapping the VLM choice or if no local vision model exists.
- Leave a short status: what's wired, what model you used, the verification result.

-- Cal