Product Design and Development · Ulrich & Eppinger · same dialogue, different TTS engines
Two models survived testing tonight: F5-TTS (Mistral-trained flow-matching) and XTTS v2 (Coqui).
Higgs Audio v2 was finally killed off — it was leaking Harry Potter content from the broom_salesman reference. F5-TTS had a similar leak pattern initially (kept saying "particularly in a moment of need") which is why you'll see _clean in filenames — those use auto-transcribed reference audio to prevent the leak.
Verified clean: all 5 F5-TTS files spot-checked by local Whisper transcription, content matches the script verbatim. Both XTTS files also verified clean.
What to compare: Ch 4 and Ch 5 each have BOTH F5-TTS and XTTS versions — listen to both and tell me which voice you prefer. F5-TTS is faster-paced (~140-160 wpm). XTTS is slower (~117 wpm). Decide what works on the bus.
New chapters (6–10): Dialogues authored from the just-OCR'd Ch 6–10 of Ulrich & Eppinger. Each chapter has both F5-TTS and XTTS versions ready for A/B comparison. F5-TTS uses --auto-transcribe to prevent the broom_salesman ref-audio leak (same fix as Ch 1-5).