Performance
100%
37/37166s
activation → loaded2.261×
1.24× – 8.02×8.6s
4.1s – 36.0sCapabilities
Base En
Sound Effects
Out of scopeFailure Modes
Multilingual
Code Switching
Cloning
Voice Design
Out of scopePronunciation
Pauses
Streaming
Out of scopeLong Form
Showcase / extended cases
Voice cloning
1 casesext-clone-out
This is a cloning test using a reference recording, on the eight billion parameter flagship.
First actual cloning test on v1.5 (the harness 4.x cases fell back to default voice — preset missing on fresh activation; this extended test passed the same desk-ref preset that worked on Local). RTF 5.49x is higher than v1.5's median 2.22x because cloning has reference-processing overhead. Played A/B against the Local clone — see comparison notes in moss-tts-local.md.Standard harness
Base English
5 cases1.1-short
Hello, this is the first sentence.
1.2-medium
The quick brown fox jumps over the lazy dog, and afterwards goes to sleep.
1.3-question
Could you please confirm whether the deployment succeeded?
1.4-exclamation
Watch out, that is dangerous!
1.5-long-paragraph
The deployment process began at six in the morning. By half past seven, the first replicas were warm and serving traffic. Engineers checked the dashboards every few minutes, watching for the subtle latency increase that always preceded a regression. The new model had been tested for weeks in staging, but production traffic exposed edge cases that no synthetic load could simulate.
Failure modes
5 cases11.1-empty
(no text)
11.2-punctuation-only
!?...?!
11.3-mixed-script
Hello 世界 مرحبا こんにちは namaste
11.4-symbols
$1,234.56 (75% off) @ 3PM EST
11.5-markdown-residue
<b>Hello</b> **world** _italic_
Multilingual
6 cases2.1-zh
你好,今天天气很好,适合出去散步。
2.2-ja
こんにちは、お元気ですか?今日もいい天気ですね。
2.3-es
Hola, ¿cómo estás hoy? Espero que muy bien.
2.4-fr
Bonjour, comment allez-vous aujourd'hui?
2.5-ar
مرحبا، كيف حالك اليوم؟ أتمنى أن تكون بخير.
2.6-hi
नमस्ते, आप कैसे हैं? आज मौसम बहुत अच्छा है।
Code switching
3 cases3.1-en-zh
I'll meet you at the 茶馆 at three in the afternoon.
3.2-en-es
She said hola and then waved goodbye.
3.3-en-ja
The Japanese word for thank you is ありがとう.
Voice cloning
10 cases4.1-clone-clean-5s
The quick brown fox jumps over the lazy dog.
4.2-clone-clean-15s
The quick brown fox jumps over the lazy dog.
4.3-clone-clean-30s
The quick brown fox jumps over the lazy dog.
4.4-clone-noisy
The quick brown fox jumps over the lazy dog.
4.5-clone-accented
The quick brown fox jumps over the lazy dog.
4.6-clone-whispered
The quick brown fox jumps over the lazy dog.
4.7-clone-raspy
The quick brown fox jumps over the lazy dog.
4.8-clone-reverb
The quick brown fox jumps over the lazy dog.
4.9-clone-child
The quick brown fox jumps over the lazy dog.
4.10-clone-cross-lang
你好,今天天气很好。
Pronunciation
4 cases6.1-irish-name
Her name is Saoirse Ronan.
6.2-brand-hyundai
I drive a Hyundai Ioniq.
6.3-gif-vs-jif
Save the file as a GIF and not a JPEG.
6.4-sql
We use SQL to query the database.
Pauses
3 cases7.1-pause-short
Wait, [pause 0.5s] for it.
7.2-pause-medium
She paused, [pause 1.5s] then continued.
7.3-pause-long
And then [pause 3.0s] silence.
Issues
max-new-tokens-none
highfixedkwargs.get('max_new_tokens', 4096) returned None when key was present with None value; v1.5's modeling code calls range(None) → TypeError. Fixed via 'or 4096' fallback in both MossLocalBackend and MossV15Backend.
liveness-restart
highfixedsynth handlers were async def, blocked event loop during multi-minute synthesis, kubelet killed pod. Fixed by converting tts_generate/tts_stream/create_preset to def so FastAPI threadpools them. Confirmed working: no restarts during v1.5 test session.
pipe-deadlock
criticalfixedWorker subprocess stdout/stderr captured to subprocess.PIPE that orchestrator never drained; tqdm progress bars filled the 64 KB buffer and blocked the worker on pipe_write. Fixed by inheriting pipes.
service-selector-overlap
mediumopenera-tts and era-tts-moss Services both select app=era-tts; requests round-robin to both pods. Workaround: direct pod port-forward.
v15-faster-than-local-architectural
lowby-designv1.5 8B has lower RTF than Local-Transformer 1.7B (2.22x vs 4.39x median). Counter-intuitive but consistent across all 7 categories. Likely v1.5 uses larger fused matmuls per generation step, requiring fewer total steps per audio second.
Deployment
era-tts-moss
era-core
nvidia-tesla-a100
gke-spot (preemptible)
era-tts-model-cache-moss
1800s