What are Neural Voices?

AWS, Google, and Microsoft have two types of voices: “Standard” and “Neural”. Neural voices use AI-based language models and generally sound much more natural than standard voices.

How to use neural voices

Since the prices for using standard voices and neural voices differ significantly, neural voices are not available in the “Authoring” plan. In order to use neural voices in a project, it must be created with a “Production” account. Afterwards, everyone invited to the project can also use the neural voices, even if they have only subscribed to Frazier “Authoring”. So it’s important who is creating the project!

How do I recognize a neural voice?

On the one hand, in a direct comparison between the standard voice and the neural voice, you will quickly notice that the speech output contains significantly fewer pronunciation errors. The sound quality is also better. Depending on the provider, you can also identify the voices by name as follows:

AWS

With AWS Polly, it is difficult to maintain a specific list of names because this can change on a daily basis and sometimes standard voices are converted into neural voices. Please take a look at this list of neural voices. If you find a voice in Frazier that is not listed there, it is a standard voice.

Google

Standard voices: All standard voices can be recognized by their name: “de-DE-Standard-*”. The asterisk is replaced by a letter, currently it goes from A-F.
Neural voices: You can also recognize neural voices by their name. It contains the words “Wavenet”, “Neural2” or “Polyglot”.

Tip: A special feature of Google: different voices with the same letter at the end are records of the same person. This means that standard voices and neural voices behave very similarly at this point. So, a manuscript that was created with “de-DE-Standard-B” can later be read out using “de-DE-Wavenet-B” - without any major subsequent changes.

Microsoft

Standard voices: All of Microsoft’s standard voices lack the word “Neural” in the name.
Neural voices: All of Microsoft’s neural voices have the word “Neural” in their name.

Standard voices will soon be removed!

Attention: Microsoft will no longer offer standard voices from August 31, 2024. Here is the original message that reached us:

“You’re receiving this email because you may be using standard voices of Text-to-Speech, a capability of Speech service within Azure Cognitive Services. Text-to-Speech currently supports both standard and neural voices. However, since the neural voices provide more natural sounding speech output, and thus, a better end-user experience, we’re retiring the standard voices on 31 August 2024 and they’ll no longer be supported after that date.” – Microsoft

So, better to use only Microsoft’s neural voices from now on. Then there won’t be any nasty surprises next year.

ElevenLabs

Frazier also has a plugin for integrating ElevenLabs. The latest generation of text-to-speech engines allows you to clone your own voice in addition to a variety of new voices.

Do you have an ElevenLabs account that you want to use in Frazier? Please contact support to discuss the integration into your user account.

Last Update: 2023/10/01

Introduction

Audio Export