MOSS-TTS-v1.5 8B

8Bpending

MOSS-TTS · moss_tts_delay · Apache 2.0🤗 OpenMOSS-Team/MOSS-TTS-v1.5

Performance

Success rate

100%

37/37

Cold start

166s

activation → loaded

RTF (median)

2.261×

1.24× – 8.02×

Latency (median)

8.6s

4.1s – 36.0s

Liveness fix (PR #1950) confirmed working — no restarts during 105s activation + 5m40s harness run.

Capabilities

Base En

pending listening

Sound Effects

Out of scope

not supported

Failure Modes

no crash

Multilingual

pending listening

Code Switching

pending listening

Cloning

untestedHarness cloning cases produced audio but desk-ref preset is not seeded on this pod; outputs fell back to default voice via MossV15Backend._load_reference. Not actual cloning tests.

Voice Design

Out of scope

not supported

Pronunciation

pending listening

Pauses

pending listeningSame [pause X.Ys] syntax used; Local-Transformer pronounced these literally. Need to listen whether v1.5 honors them properly.

Streaming

Out of scope

not supported

Long Form

pending listening1.5-long-paragraph case did synthesize. Listen for whether 24s cap appears here too or if 8B handles longer.

Showcase / extended cases

Voice cloning

1 cases

ext-clone-out

extclone

This is a cloning test using a reference recording, on the eight billion parameter flagship.

First actual cloning test on v1.5 (the harness 4.x cases fell back to default voice — preset missing on fresh activation; this extended test passed the same desk-ref preset that worked on Local). RTF 5.49x is higher than v1.5's median 2.22x because cloning has reference-processing overhead. Played A/B against the Local clone — see comparison notes in moss-tts-local.md.

RTF 5.49×

lat 36.00s

—

Standard harness

Base English

5 cases

1.1-short

Hello, this is the first sentence.

RTF 8.02×

lat 23.73s

—

1.2-medium

The quick brown fox jumps over the lazy dog, and afterwards goes to sleep.

RTF 1.79×

lat 8.58s

—

1.3-question

Could you please confirm whether the deployment succeeded?

RTF 2.16×

lat 6.57s

—

1.4-exclamation

Watch out, that is dangerous!

RTF 1.58×

lat 8.87s

—

1.5-long-paragraph

The deployment process began at six in the morning. By half past seven, the first replicas were warm and serving traffic. Engineers checked the dashboards every few minutes, watching for the subtle latency increase that always preceded a regression. The new model had been tested for weeks in staging, but production traffic exposed edge cases that no synthetic load could simulate.

RTF 1.24×

lat 26.23s

—

Failure modes

5 cases

11.1-empty

(no text)

RTF 3.81×

lat 4.87s

—

11.2-punctuation-only

!?...?!

RTF 6.46×

lat 4.14s

—

11.3-mixed-script

Hello 世界 مرحبا こんにちは namaste

RTF 2.18×

lat 7.16s

—

11.4-symbols

$1,234.56 (75% off) @ 3PM EST

RTF 1.51×

lat 11.86s

—

11.5-markdown-residue

<b>Hello</b> **world** _italic_

RTF 1.96×

lat 7.98s

—

Multilingual

6 cases

2.1-zh

你好,今天天气很好,适合出去散步。

RTF 2.26×

lat 8.32s

—

2.2-ja

こんにちは、お元気ですか?今日もいい天気ですね。

RTF 3.77×

lat 17.79s

—

2.3-es

Hola, ¿cómo estás hoy? Espero que muy bien.

RTF 2.29×

lat 10.44s

—

2.4-fr

Bonjour, comment allez-vous aujourd'hui?

RTF 2.46×

lat 7.49s

—

2.5-ar

مرحبا، كيف حالك اليوم؟ أتمنى أن تكون بخير.

RTF 2.42×

lat 10.67s

—

2.6-hi

नमस्ते, आप कैसे हैं? आज मौसम बहुत अच्छा है।

RTF 2.57×

lat 13.58s

—

Code switching

3 cases

3.1-en-zh

I'll meet you at the 茶馆 at three in the afternoon.

RTF 2.17×

lat 8.85s

—

3.2-en-es

She said hola and then waved goodbye.

RTF 2.90×

lat 8.59s

—

3.3-en-ja

The Japanese word for thank you is ありがとう.

RTF 4.00×

lat 10.25s

—

Voice cloning

10 cases

4.1-clone-clean-5s

clone

The quick brown fox jumps over the lazy dog.

RTF 2.04×

lat 8.01s

—

4.2-clone-clean-15s

clone

The quick brown fox jumps over the lazy dog.

RTF 1.63×

lat 9.02s

—

4.3-clone-clean-30s

clone

The quick brown fox jumps over the lazy dog.

RTF 2.02×

lat 7.44s

—

4.4-clone-noisy

clone

The quick brown fox jumps over the lazy dog.

RTF 1.92×

lat 7.38s

—

4.5-clone-accented

clone

The quick brown fox jumps over the lazy dog.

RTF 1.43×

lat 13.11s

—

4.6-clone-whispered

clone

The quick brown fox jumps over the lazy dog.

RTF 2.05×

lat 7.06s

—

4.7-clone-raspy

clone

The quick brown fox jumps over the lazy dog.

RTF 1.73×

lat 8.98s

—

4.8-clone-reverb

clone

The quick brown fox jumps over the lazy dog.

RTF 1.95×

lat 7.17s

—

4.9-clone-child

clone

The quick brown fox jumps over the lazy dog.

RTF 2.06×

lat 6.93s

—

4.10-clone-cross-lang

clone

你好,今天天气很好。

RTF 2.42×

lat 6.58s

—

Pronunciation

4 cases

6.1-irish-name

Her name is Saoirse Ronan.

RTF 3.88×

lat 5.90s

—

6.2-brand-hyundai

I drive a Hyundai Ioniq.

RTF 2.55×

lat 8.57s

—

6.3-gif-vs-jif

Save the file as a GIF and not a JPEG.

RTF 2.35×

lat 7.89s

—

6.4-sql

We use SQL to query the database.

RTF 2.58×

lat 7.02s

—

Pauses

3 cases

7.1-pause-short

Wait, [pause 0.5s] for it.

RTF 3.72×

lat 5.36s

—

7.2-pause-medium

She paused, [pause 1.5s] then continued.

RTF 2.43×

lat 8.56s

—

7.3-pause-long

And then [pause 3.0s] silence.

RTF 1.97×

lat 8.67s

—

Issues

max-new-tokens-none

highfixed

kwargs.get('max_new_tokens', 4096) returned None when key was present with None value; v1.5's modeling code calls range(None) → TypeError. Fixed via 'or 4096' fallback in both MossLocalBackend and MossV15Backend.

liveness-restart

highfixed

synth handlers were async def, blocked event loop during multi-minute synthesis, kubelet killed pod. Fixed by converting tts_generate/tts_stream/create_preset to def so FastAPI threadpools them. Confirmed working: no restarts during v1.5 test session.

pipe-deadlock

criticalfixed

Worker subprocess stdout/stderr captured to subprocess.PIPE that orchestrator never drained; tqdm progress bars filled the 64 KB buffer and blocked the worker on pipe_write. Fixed by inheriting pipes.

service-selector-overlap

mediumopen

era-tts and era-tts-moss Services both select app=era-tts; requests round-robin to both pods. Workaround: direct pod port-forward.

v15-faster-than-local-architectural

lowby-design

v1.5 8B has lower RTF than Local-Transformer 1.7B (2.22x vs 4.39x median). Counter-intuitive but consistent across all 7 categories. Likely v1.5 uses larger fused matmuls per generation step, requiring fewer total steps per audio second.

Deployment

Service

era-tts-moss

Namespace

era-core

GPU

nvidia-tesla-a100

GPU mode

gke-spot (preemptible)

PVC

era-tts-model-cache-moss

Worker timeout

1800s