F5-TTS

SWivid

Flow-matching based text-to-speech with natural prosody and zero-shot voice cloning from short audio samples.

About

F5-TTS is a flow-matching based text-to-speech system that produces natural-sounding speech with excellent prosody. It supports zero-shot voice cloning from short audio samples without fine-tuning.

Deployment Options

1 stack

You might also like