StyleTTS2
Style-based text-to-speech achieving human-level naturalness through style diffusion and adversarial training with prosody modeling.
About
StyleTTS2 achieves human-level text-to-speech naturalness through style diffusion and adversarial training. It models speech styles as latent random variables, producing highly expressive and natural-sounding output.