MOSS-VoiceGenerator 1.7B

1.7Bpending
MOSS-TTS · moss_tts_delay · Apache 2.0🤗 OpenMOSS-Team/MOSS-VoiceGenerator

Performance

Success rate

100%

17/17
Cold start

139s

activation → loaded
RTF (median)

Latency (median)

1.7B model on the same A100 pod. Cold start 139s — slower than moss-tts-local (45s, same 1.7B size) due to first-time weight download from HF rather than PVC cache. Phase A (15 voice-design cases) ran in 1m44s with mean ~7s/case. Phase B chained tests required a model swap to moss-tts-d (~170s reload) then ~8-9s/case. All 17 cases succeeded after harness fix for the `model` field in /api/v1/tts/generate.

Capabilities

Base En

pending listeningSample text in 12/15 cases is English; baseline intelligibility judged on listen-back.

Sound Effects

Out of scope
not supported

Failure Modes

Out of scope
out of scope

Multilingual

pending listeningvg-11 Chinese, vg-17 English+Chinese chained — confirm pronunciation.

Code Switching

pending listeningImplicit in vg-17 (English [S1] + Mandarin [S2]).

Cloning

Out of scope
not supportedVoiceGenerator designs voices from text descriptions, not from reference audio.

Voice Design

pending listeningModel's primary capability. 15 designs span accent/age/gender/emotion/role/lang. Verdict pending listen-back.

Pronunciation

Out of scope
out of scope

Pauses

Out of scope
out of scope

Streaming

Out of scope
not supported

Dialogue

pending listeningTested via vg-16/vg-17 which feed VG-designed presets into moss-tts-d dialogue synthesis.

Long Form

Out of scope
out of scopeVoice-design samples are short by design (one sentence).

Showcase / extended cases

accent-australian

1 cases

vg-06-aussie-female

extclone

Reckon we should head down to the beach this arvo? The weather's brilliant.

accent-irish

1 cases

vg-07-irish-storyteller

extclone

It was a dark and stormy night, you see, and the lighthouse keeper had gone to bed early.

age-child

1 cases

vg-05-cheerful-child

extclone

Look mom, I drew a picture for you! It has a rainbow and a unicorn!

age-elderly-female

1 cases

vg-01-warm-grandmother

extclone

Hello dear, would you like some tea? I've just baked some scones.

age-elderly-male

1 cases

vg-14-raspy-old-man

extclone

Many years ago, in these very woods, something strange happened to my grandfather.

Chained cross-model

2 cases

vg-16-chain-grandmother-dialogue

extclone

[S1] Hello dear, would you like some tea? [S2] Yes please, Grandma! Can I have biscuits too?

Reuse the grandmother preset designed by moss-voice-generator in a multi-speaker dialogue via moss-tts-d. Verifies the voice-design → preset → cross-model synthesis pipeline.

vg-17-chain-anchor-multilingual

extclone

[S1] Good evening. We have breaking news from Beijing. [S2] 大家好,这里是北京现场报道。

Reuse the news-anchor preset in an English+Chinese dialogue via moss-tts-d. Verifies multilingual carry-over of designed voices.

emotion-calm

1 cases

vg-03-meditation-coach

extclone

Breathe in deeply, then slowly exhale. Let your shoulders relax.

emotion-sad

1 cases

vg-09-sad-breathy

extclone

I just need a little time alone tonight. Please don't take it personally.

lang-chinese

1 cases

vg-11-chinese-gentle

extclone

你好,今天感觉怎么样?希望你度过愉快的一天。

role-announcer

1 cases

vg-02-radio-announcer

extclone

And now, the top stories of the morning, coming up right after this.

role-authoritative

1 cases

vg-04-military-cmdr

extclone

Attention. Form ranks immediately. We move out at oh six hundred.

role-news

1 cases

vg-13-news-anchor

extclone

Good evening. I'm reporting live from the capital where breaking news is unfolding.

role-service

1 cases

vg-15-customer-service

extclone

Thank you for calling. How may I help you today? I'd be happy to assist.

role-sports

1 cases

vg-10-sports-comm

extclone

He shoots, he scores! What a goal in the final minute of the match!

special-asmr

1 cases

vg-12-asmr-whisper

extclone

Let me whisper a little story to you tonight. Close your eyes and listen.

special-robotic

1 cases

vg-08-robotic-ai

extclone

Processing your request. Please stand by. Estimated time, twelve seconds.


Issues

vg-chained-needs-model-field

lowfixed

Initial harness run hit 404 on /api/v1/tts/generate for vg-16/vg-17 because the POST body omitted `model`; orchestrator defaulted to 'orpheus-3b' which isn't registered on this pod. Fixed in scripts/run_voice_design_tests.py — synth_with_preset() now requires `model=` and passes it through. Both chained cases succeeded on direct retry.

Deployment

Service

era-tts-moss

Namespace

era-core

GPU

nvidia-tesla-a100

GPU mode

gke-spot (preemptible)

PVC

era-tts-model-cache-moss

Worker timeout

900s