Performance
76%
35/4645s
activation → loaded4.247×
3.53× – 8.74×13.4s
3.7s – 104.0sCapabilities
Base En
Sound Effects
Out of scopeFailure Modes
Multilingual
Code Switching
Cloning
Voice Design
Out of scopePronunciation
Pauses
Streaming
Out of scopeLong Form
Showcase / extended cases
Voice cloning
1 casesext-clone-out
This is a cloning test using a reference recording.
First cloning attempt. Does output match reference voice?Pauses
2 casesext-pause-zh
我今天学习了一首中国的古诗,它的名字是[pause 3.2s]静夜思!
Upstream model card example. Does [pause 3.2s] produce silence here?ext-pause-en
Listen carefully.[pause 2.0s]The answer is forty two.
Pause re-test in English.Long form
1 casesext-longform
The deployment process began at six in the morning. By half past seven, the first replicas were warm and serving traffic. Engineers checked the dashboards every few minutes, watching for the subtle latency increase that always preceded a regression. The new model had been tested for weeks in staging, but production traffic exposed edge cases that no synthetic load could simulate. By noon, the team had isolated the issue and shipped a fix.
Output truncated — only 26s despite ~80 words and max_new_tokens=8192speed-determinism
3 casesext-warm-1
Hello, this is a repeatability test.
Sequential synth attempt 1 of 3 with identical input. Different audio length each call → model non-deterministic without seed.ext-warm-2
Hello, this is a repeatability test.
Sequential synth attempt 2 of 3 with identical input.ext-warm-3
Hello, this is a repeatability test.
Sequential synth attempt 3 of 3 with identical input. RTF stable at ~3.6-3.7x across attempts; latency variance is from audio-length variance.speed-ttfb
3 casesext-ttfb-short
Hello, this is a short sentence.
TTFB via /api/v1/tts/stream. TTFB == total latency — confirms /stream is chunked pre-rendered delivery, not incremental decode.ext-ttfb-zh
你好,今天天气很好,适合出去散步。
TTFB on /stream for Chinese. TTFB - total < 1ms.ext-ttfb-long
The deployment process began at six in the morning. By half past seven the first replicas were warm.
TTFB on /stream for long English. TTFB - total < 1ms.Standard harness
Base English
5 cases1.1-short
Hello, this is the first sentence.
1.2-medium
The quick brown fox jumps over the lazy dog, and afterwards goes to sleep.
1.3-question
Could you please confirm whether the deployment succeeded?
1.4-exclamation
Watch out, that is dangerous!
1.5-long-paragraph
The deployment process began at six in the morning. By half past seven, the first replicas were warm and serving traffic. Engineers checked the dashboards every few minutes, watching for the subtle latency increase that always preceded a regression. The new model had been tested for weeks in staging, but production traffic exposed edge cases that no synthetic load could simulate.
Failure modes
5 cases11.1-empty
(no text)
11.2-punctuation-only
!?...?!
11.3-mixed-script
Hello 世界 مرحبا こんにちは namaste
11.4-symbols
$1,234.56 (75% off) @ 3PM EST
11.5-markdown-residue
<b>Hello</b> **world** _italic_
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))Multilingual
6 cases2.1-zh
你好,今天天气很好,适合出去散步。
2.2-ja
こんにちは、お元気ですか?今日もいい天気ですね。
2.3-es
Hola, ¿cómo estás hoy? Espero que muy bien.
2.4-fr
Bonjour, comment allez-vous aujourd'hui?
2.5-ar
مرحبا، كيف حالك اليوم؟ أتمنى أن تكون بخير.
2.6-hi
नमस्ते, आप कैसे हैं? आज मौसम बहुत अच्छा है।
Code switching
3 cases3.1-en-zh
I'll meet you at the 茶馆 at three in the afternoon.
3.2-en-es
She said hola and then waved goodbye.
3.3-en-ja
The Japanese word for thank you is ありがとう.
Voice cloning
10 cases4.1-clone-clean-5s
The quick brown fox jumps over the lazy dog.
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))4.2-clone-clean-15s
The quick brown fox jumps over the lazy dog.
HTTPConnectionPool(host='localhost', port=8090): Max retries exceeded with url: /api/v1/tts/generate (Caused by NewConne4.3-clone-clean-30s
The quick brown fox jumps over the lazy dog.
HTTPConnectionPool(host='localhost', port=8090): Max retries exceeded with url: /api/v1/tts/generate (Caused by NewConne4.4-clone-noisy
The quick brown fox jumps over the lazy dog.
HTTPConnectionPool(host='localhost', port=8090): Max retries exceeded with url: /api/v1/tts/generate (Caused by NewConne4.5-clone-accented
The quick brown fox jumps over the lazy dog.
HTTPConnectionPool(host='localhost', port=8090): Max retries exceeded with url: /api/v1/tts/generate (Caused by NewConne4.6-clone-whispered
The quick brown fox jumps over the lazy dog.
HTTPConnectionPool(host='localhost', port=8090): Max retries exceeded with url: /api/v1/tts/generate (Caused by NewConne4.7-clone-raspy
The quick brown fox jumps over the lazy dog.
HTTPConnectionPool(host='localhost', port=8090): Max retries exceeded with url: /api/v1/tts/generate (Caused by NewConne4.8-clone-reverb
The quick brown fox jumps over the lazy dog.
HTTPConnectionPool(host='localhost', port=8090): Max retries exceeded with url: /api/v1/tts/generate (Caused by NewConne4.9-clone-child
The quick brown fox jumps over the lazy dog.
HTTPConnectionPool(host='localhost', port=8090): Max retries exceeded with url: /api/v1/tts/generate (Caused by NewConne4.10-clone-cross-lang
你好,今天天气很好。
HTTPConnectionPool(host='localhost', port=8090): Max retries exceeded with url: /api/v1/tts/generate (Caused by NewConnePronunciation
4 cases6.1-irish-name
Her name is Saoirse Ronan.
6.2-brand-hyundai
I drive a Hyundai Ioniq.
6.3-gif-vs-jif
Save the file as a GIF and not a JPEG.
6.4-sql
We use SQL to query the database.
Pauses
3 cases7.1-pause-short
Wait, [pause 0.5s] for it.
7.2-pause-medium
She paused, [pause 1.5s] then continued.
7.3-pause-long
And then [pause 3.0s] silence.
Issues
pipe-deadlock
criticalfixedWorker subprocess pipe deadlock — root cause of all activation hangs across L4 and A100 attempts
liveness-restart
highopenLiveness probe times out during long synthesis (event loop blocked by sync call); 4 pod restarts observed in 85 min of testing. Production blocker — drops in-flight requests.
non-deterministic-no-seed
mediumopenWithout an explicit `seed` parameter, identical (text, preset_id) input produces different audio durations and prosody across calls. Disables hash-based output caching. Untested: whether passing `seed` actually makes output deterministic.
streaming-endpoint-not-incremental
lowby-design/api/v1/tts/stream serves a pre-rendered buffer chunked at 4 KB; TTFB equals total synth latency. moss_tts_delay is batch-only by architecture. True streaming would require the Realtime variant (upstream config bug).
service-selector-overlap
mediumopenera-tts and era-tts-moss Services both select app=era-tts; requests round-robin to both pods. Workaround: direct pod port-forward.
long-form-cap
mediumopenmax_new_tokens=8192 only extends audio to ~26s (no proportional gain over 4096)
pause-tag-literal
mediumopen[pause X.Ys] markers pronounced literally; canonical upstream syntax per model card. May be v1.5-only feature, or needs different preprocessing.
Deployment
era-tts-moss
era-core
nvidia-tesla-a100
gke-spot (preemptible)
era-tts-model-cache-moss
3600s