# Task 4 for Hermes — LatentSync lip-sync in its OWN isolated venv + fix the pink-smear

(Same machine/context as the earlier briefs. This is the BIG one. Read the history below — Calypso already fought most of this last night; don't rediscover it.)

## Goal
Give Malin working lip-sync: take one of her rendered videos + her generated voice audio, and produce a clip where her MOUTH moves to match the speech (vs the current voiceover, where audio just plays over a still-mouthed clip). Two hard requirements:

1. **FULL ISOLATION** — LatentSync runs in its OWN dedicated venv, completely separate from `C:\ComfyUI\.venv`. Installing LatentSync into ComfyUI's shared venv is what broke ALL of Malin's photo rendering for a day (bumped numpy→2.x). Never again. So do NOT use the ComfyUI custom-node version — set up LatentSync as a STANDALONE tool the harness calls externally.
2. **Fix the pink-smear** — last time the output had a blurry pink smear over the mouth (no visible lip movement), even on a close-up face. Diagnosis: fp16 precision garbage on this Blackwell / CUDA-13 / torch-2.10 box (same family as a VAE bug we fixed by forcing fp32). The real work is forcing fp32 in LatentSync's inference (VAE + UNet) so the mouth renders clean.

## What's already on the box (reuse, don't re-download)
- LatentSync MODELS already downloaded at `C:\ComfyUI\models\lipsync\latentsync\`: `whisper/tiny.pt`, `latentsync_unet.pt` (3.4GB), `latentsync_syncnet.pt` (1.49GB), and `sd-vae-ft-mse\` (config.json + diffusion_pytorch_model.safetensors). Point the standalone install at these (copy or symlink) so you don't re-fetch ~5GB.
- ffmpeg is installed (Gyan.FFmpeg via winget).

## Known landmines from last night (in a fresh venv these are yours to solve cleanly)
- torch 2.10/cu130 → torchcodec FAILED to load (ABI: `libtorchcodec_core4.dll`). Last time we routed around it by patching `torchaudio.save` → `soundfile.write`. In your fresh venv you can either pick a torch version with working torchcodec, or apply the soundfile workaround. Your call — it's isolated now.
- The original project is `chunyu-li/LatentSync` (HuggingFace + GitHub). Use it or a maintained fork.

## Do this
1. Create `C:\latentsync\` with its OWN venv (`C:\latentsync\.venv`). Install LatentSync + its deps + a GPU torch there. NOTHING touches `C:\ComfyUI\.venv`.
2. Wire it to the existing models at `C:\ComfyUI\models\lipsync\latentsync\` (copy/symlink).
3. Force fp32 where the fp16 pink-smear comes from (VAE + UNet inference dtype) and get a CLEAN mouth.
4. Build a simple CLI: e.g. `C:\latentsync\.venv\Scripts\python.exe C:\latentsync\run_lipsync.py --video <in.mp4> --audio <in.wav> --out <out.mp4>`.
5. Test end-to-end on a forward-facing face clip + a speech audio. Success = output plays, her mouth moves in sync, NO pink smear / no blur over the mouth.

## Constraints
- Total isolation from ComfyUI's venv — and VERIFY ComfyUI face rendering still works after you're done (run `C:\malin\diagnose_render.py`, or have Jun send Malin a selfie).
- Don't re-download the ~5GB of models.
- If the pink-smear can't be fully cracked, document exactly what you tried + the current output quality; voiceover stays the working fallback. Honesty over forcing it.

## Deliverable
A working standalone lip-sync CLI in its own venv. Report: the exact invocation command, where it lives, a sample output's quality (clean mouth? any artifacts?), and confirmation that ComfyUI's render still works. Calypso will then wire the CLI into malin.py's video pipeline (render → voice → lip-sync pass → send, with voiceover as fallback).
