MOSS-SoundEffect 8B

8Bpending
MOSS-TTS · moss_tts_delay · Apache 2.0🤗 OpenMOSS-Team/MOSS-SoundEffect

Performance

Success rate

95%

39/41
Cold start

190s

activation → loaded
RTF (median)

1.301×

1.00×4.62×
Latency (median)

13.1s

10.1s 46.5s
Same A100 pod as Local/v1.5/TTSD. Cold start 190s. All outputs are fixed ~10.08s clips regardless of prompt complexity — SFX model emits a constant-length codec stream. Two transient errors during the 41-case harness: 1.1-short returned 'Resource temporarily unavailable' on first call (activation race), and 2.3-es triggered a worker crash + auto-respawn (the subsequent 2.4-fr case took 46s vs the ~13s norm, consistent with reload). Both recovered without intervention.

Capabilities

Base En

Out of scope
not supportedModel is sound-effect only; does not synthesize speech. The 1.x-11.x harness cases produce 10s of SFX/noise rather than the requested speech and should be treated as control points, not pass/fail tests.

Sound Effects

pendingModel's specialty. 5 showcase prompts (10.1–10.5) cover ambient, weather, percussive impact, animal sequence, mechanical sustained. Listening verdict pending.

Failure Modes

Out of scope
not supported

Multilingual

Out of scope
not supported

Code Switching

Out of scope
not supported

Cloning

Out of scope
not supported

Voice Design

Out of scope
not supported

Pronunciation

Out of scope
not supported

Pauses

Out of scope
not supported

Streaming

Out of scope
not supported

Dialogue

Out of scope
not supported

Standard harness

Base English

5 cases

1.1-short

http_error

Hello, this is the first sentence.

500: {"detail":"Synthesis failed: [Errno 11] Resource temporarily unavailable"}
no audio

1.2-medium

The quick brown fox jumps over the lazy dog, and afterwards goes to sleep.

RTF 1.21×
lat 12.18s

1.3-question

Could you please confirm whether the deployment succeeded?

RTF 1.06×
lat 10.64s

1.4-exclamation

Watch out, that is dangerous!

RTF 1.21×
lat 12.15s

1.5-long-paragraph

The deployment process began at six in the morning. By half past seven, the first replicas were warm and serving traffic. Engineers checked the dashboards every few minutes, watching for the subtle latency increase that always preceded a regression. The new model had been tested for weeks in staging, but production traffic exposed edge cases that no synthetic load could simulate.

RTF 1.00×
lat 10.05s

Sound effects

5 cases

10.1-sfx-footsteps

footsteps on gravel, walking slowly

RTF 1.30×
lat 13.10s

10.2-sfx-thunder

rolling thunder in the distance, followed by heavy rain

RTF 1.41×
lat 14.23s

10.3-sfx-glass

a glass shatters on a tile floor

RTF 1.31×
lat 13.23s

10.4-sfx-animals

a dog barks, then growls and whimpers

RTF 1.29×
lat 13.04s

10.5-sfx-mechanical

an engine starting, idling, and revving up

RTF 1.38×
lat 13.90s

Failure modes

5 cases

11.1-empty

(no text)

RTF 1.34×
lat 13.47s

11.2-punctuation-only

!?...?!

RTF 1.29×
lat 12.97s

11.3-mixed-script

Hello 世界 مرحبا こんにちは namaste

RTF 1.30×
lat 13.12s

11.4-symbols

$1,234.56 (75% off) @ 3PM EST

RTF 1.25×
lat 12.59s

11.5-markdown-residue

<b>Hello</b> **world** _italic_

RTF 1.31×
lat 13.24s

Multilingual

6 cases

2.1-zh

你好,今天天气很好,适合出去散步。

RTF 1.38×
lat 13.92s

2.2-ja

こんにちは、お元気ですか?今日もいい天気ですね。

RTF 2.42×
lat 24.16s

2.3-es

http_error

Hola, ¿cómo estás hoy? Espero que muy bien.

500: {"detail":"Model worker crashed during synthesis"}
no audio

2.4-fr

Bonjour, comment allez-vous aujourd'hui?

RTF 4.62×
lat 46.54s

2.5-ar

مرحبا، كيف حالك اليوم؟ أتمنى أن تكون بخير.

RTF 1.41×
lat 14.17s

2.6-hi

नमस्ते, आप कैसे हैं? आज मौसम बहुत अच्छा है।

RTF 1.31×
lat 13.29s

Code switching

3 cases

3.1-en-zh

I'll meet you at the 茶馆 at three in the afternoon.

RTF 1.69×
lat 16.98s

3.2-en-es

She said hola and then waved goodbye.

RTF 1.25×
lat 12.63s

3.3-en-ja

The Japanese word for thank you is ありがとう.

RTF 1.27×
lat 12.76s

Voice cloning

10 cases

4.1-clone-clean-5s

clone

The quick brown fox jumps over the lazy dog.

RTF 1.29×
lat 12.98s

4.2-clone-clean-15s

clone

The quick brown fox jumps over the lazy dog.

RTF 1.31×
lat 13.20s

4.3-clone-clean-30s

clone

The quick brown fox jumps over the lazy dog.

RTF 1.31×
lat 13.19s

4.4-clone-noisy

clone

The quick brown fox jumps over the lazy dog.

RTF 1.25×
lat 12.55s

4.5-clone-accented

clone

The quick brown fox jumps over the lazy dog.

RTF 1.25×
lat 12.59s

4.6-clone-whispered

clone

The quick brown fox jumps over the lazy dog.

RTF 1.28×
lat 12.90s

4.7-clone-raspy

clone

The quick brown fox jumps over the lazy dog.

RTF 1.31×
lat 13.16s

4.8-clone-reverb

clone

The quick brown fox jumps over the lazy dog.

RTF 1.29×
lat 12.97s

4.9-clone-child

clone

The quick brown fox jumps over the lazy dog.

RTF 1.45×
lat 14.58s

4.10-clone-cross-lang

clone

你好,今天天气很好。

RTF 2.29×
lat 23.03s

Pronunciation

4 cases

6.1-irish-name

Her name is Saoirse Ronan.

RTF 1.30×
lat 13.11s

6.2-brand-hyundai

I drive a Hyundai Ioniq.

RTF 1.28×
lat 12.90s

6.3-gif-vs-jif

Save the file as a GIF and not a JPEG.

RTF 1.35×
lat 13.56s

6.4-sql

We use SQL to query the database.

RTF 1.27×
lat 12.76s

Pauses

3 cases

7.1-pause-short

Wait, [pause 0.5s] for it.

RTF 1.49×
lat 15.05s

7.2-pause-medium

She paused, [pause 1.5s] then continued.

RTF 1.26×
lat 12.71s

7.3-pause-long

And then [pause 3.0s] silence.

RTF 1.27×
lat 12.82s

Issues

sfx-fixed-output-length

lowopen

All synthesized outputs are exactly ~10.08s regardless of prompt complexity ("glass shatters" vs. "engine starting, idling, and revving up" both produce 10.08s). This is an upstream model behavior — moss-sfx emits a constant-length codec stream. If we need shorter SFX clips (e.g. <2s impact sounds), the frontend will need to trim or we need an upstream feature to set target length.

sfx-transient-resource-error

lowopen

First call after activation (1.1-short) returned 'Resource temporarily unavailable'. Subsequent calls succeeded immediately. Likely a worker-process / GPU allocation race at first invocation.

sfx-worker-crash-on-2.3-es

mediumopen

Case 2.3-es ('Hola, ¿cómo estás hoy? Espero que muy bien.') triggered 'Model worker crashed during synthesis'. Auto-respawn worked (case 2.4-fr completed, taking 46s vs ~13s norm due to reload). Worth investigating whether specific characters (¿, accented vowels) or a heap/codec edge case caused it — not blocking but a stability flag.

Deployment

Service

era-tts-moss

Namespace

era-core

GPU

nvidia-tesla-a100

GPU mode

gke-spot (preemptible)

PVC

era-tts-model-cache-moss

Worker timeout

1800s