Performance
100%
41/41169s
activation → loaded2.036×
1.23× – 4.93×6.5s
4.8s – 47.0sCapabilities
Base En
Sound Effects
Out of scopeFailure Modes
Multilingual
Code Switching
Cloning
Voice Design
Out of scopePronunciation
Pauses
Streaming
Out of scopeDialogue
Long Form
Showcase / extended cases
Voice cloning
1 casesext-clone-out
This is a cloning test on the eight billion parameter dialogue specialist using a reference recording.
Same desk-ref preset used for Local and v1.5 cloning tests — directly comparable to ext-clone-out in those models. RTF 4.93x reflects cloning's reference-processing overhead vs TTSD's median 2.04x.Dialogue
4 cases9.1-dialogue-2speaker-short
[S1] Did you finish the report? [S2] Yes, I sent it to you this morning. [S1] Great, thanks.
[S1]/[S2] tag-based dialogue. Confirmed audible voice change between speakers — TTSD's dialogue feature works without continuation-mode prompts.9.2-dialogue-2speaker-long
[S1] So tell me about the new project. [S2] We are building a real-time TTS system for customer support. [S1] What is the latency target? [S2] Sub two hundred milliseconds, which means we need streaming inference end to end. [S1] That is ambitious. Have you tested it yet? [S2] First prototype this week. Looking good so far.
6-turn back-and-forth at 22.40s. Tests speaker-identity persistence across long dialogues.9.3-dialogue-3speaker
[S1] I think we should ship this feature next week. [S2] Hold on, we have not tested the edge cases yet. [S3] I agree. Let us wait until QA signs off. [S1] Fine, but next week is the absolute deadline.
[S3] tag added. Tests whether 3+ speakers get distinct voices.9.4-dialogue-bilingual
[S1] Welcome to our store. [S2] 你好,我想买一些礼物。 [S1] Of course, follow me.
Bilingual exchange (English [S1] + Mandarin [S2]). Tests if speaker identity AND language switch correctly within the same dialogue.Standard harness
Base English
5 cases1.1-short
Hello, this is the first sentence.
1.2-medium
The quick brown fox jumps over the lazy dog, and afterwards goes to sleep.
1.3-question
Could you please confirm whether the deployment succeeded?
1.4-exclamation
Watch out, that is dangerous!
1.5-long-paragraph
The deployment process began at six in the morning. By half past seven, the first replicas were warm and serving traffic. Engineers checked the dashboards every few minutes, watching for the subtle latency increase that always preceded a regression. The new model had been tested for weeks in staging, but production traffic exposed edge cases that no synthetic load could simulate.
Failure modes
5 cases11.1-empty
(no text)
11.2-punctuation-only
!?...?!
11.3-mixed-script
Hello 世界 مرحبا こんにちは namaste
11.4-symbols
$1,234.56 (75% off) @ 3PM EST
11.5-markdown-residue
<b>Hello</b> **world** _italic_
Multilingual
6 cases2.1-zh
你好,今天天气很好,适合出去散步。
2.2-ja
こんにちは、お元気ですか?今日もいい天気ですね。
2.3-es
Hola, ¿cómo estás hoy? Espero que muy bien.
2.4-fr
Bonjour, comment allez-vous aujourd'hui?
2.5-ar
مرحبا، كيف حالك اليوم؟ أتمنى أن تكون بخير.
2.6-hi
नमस्ते, आप कैसे हैं? आज मौसम बहुत अच्छा है।
Code switching
3 cases3.1-en-zh
I'll meet you at the 茶馆 at three in the afternoon.
3.2-en-es
She said hola and then waved goodbye.
3.3-en-ja
The Japanese word for thank you is ありがとう.
Voice cloning
10 cases4.1-clone-clean-5s
The quick brown fox jumps over the lazy dog.
4.2-clone-clean-15s
The quick brown fox jumps over the lazy dog.
4.3-clone-clean-30s
The quick brown fox jumps over the lazy dog.
4.4-clone-noisy
The quick brown fox jumps over the lazy dog.
4.5-clone-accented
The quick brown fox jumps over the lazy dog.
4.6-clone-whispered
The quick brown fox jumps over the lazy dog.
4.7-clone-raspy
The quick brown fox jumps over the lazy dog.
4.8-clone-reverb
The quick brown fox jumps over the lazy dog.
4.9-clone-child
The quick brown fox jumps over the lazy dog.
4.10-clone-cross-lang
你好,今天天气很好。
Pronunciation
4 cases6.1-irish-name
Her name is Saoirse Ronan.
6.2-brand-hyundai
I drive a Hyundai Ioniq.
6.3-gif-vs-jif
Save the file as a GIF and not a JPEG.
6.4-sql
We use SQL to query the database.
Pauses
3 cases7.1-pause-short
Wait, [pause 0.5s] for it.
7.2-pause-medium
She paused, [pause 1.5s] then continued.
7.3-pause-long
And then [pause 3.0s] silence.
Issues
ttsd-no-explicit-dialogue-path
lowopenMossTTSDBackend.synthesize() uses processor mode='generation' (mirrors v1.5). Upstream TTSD usage recommends mode='continuation' with prompt audio + per-speaker reference for proper multi-speaker output. Our [S1]/[S2] dialogue extended_cases probe whether generation mode alone produces distinguishable voices. If not, a dedicated synthesize_dialogue() method is the follow-up.
Deployment
era-tts-moss
era-core
nvidia-tesla-a100
gke-spot (preemptible)
era-tts-model-cache-moss
1800s