ClipCannon Voice Clone Pipeline - Official Benchmark Results
0.779 SeedTTS-Eval SIM | Beats Human Ground Truth by +0.049 | 0.975 on Clean Reference
SeedTTS-Eval Official Benchmark (1,088 Samples)
Scored with the official WavLM-Large + ECAPA-TDNN encoder (192-dim).
| Metric | Score |
|---|---|
| Mean SIM | 0.779 |
| Median SIM | 0.785 |
| Max SIM | 0.896 |
| p90 | 0.842 |
| p75 | 0.816 |
| p25 | 0.748 |
| Min | 0.520 |
| Samples | 1,088 |
| Runtime | 33.3 hours on RTX 5090 |
Beats Human Ground Truth by +0.049
Human recordings of the same speakers score 0.730 on this benchmark. My clones score 0.779. The AI produces more consistent speaker identity than real recordings.