Fedorum YouTube Channel

We are just days away from uploading the first Fedora Workstation 36 videos.

Best text to speech option for Linux

The best TTS Text to Speech option on Linux

Linux is a robust operating system that is perfect for programing and web design. But how does it handle text to speech and audio tasks? Here at Fedorum.com, we do a lot of audio recording on GNU/Linux including text to speech ads for AI software projects. Instead of explaining, I’d like to start off with a little TTS sample you can listen to. The audio file below plays an mp3 files that says “Hello and welcome to fedorum.com“.

This female TTS sample was recorded on a Fedora Workstation 36 on September 24, 2022

To start working with TTS on Fedora Linux, we need to install two small packages.

Best text to speech option for Linux
Google TTS is the best option for Linux

There are a few ways to create TTS files on Linux but based on the work we do, we have decided to use gtts instead of eSpeak or festival. If you want to create your own TTS mp3 files then open a terminal and issue the following command:

sudo dnf install gtts python3-gtts

Once gtts and python3-gtts were installed, I created the above audio sample by issuing this command in the Gnome terminal: gtts-cli ‘hello and welcome to fedorum dot com!’ –output hello.mp3

A few seconds later, a new file with the name “hello.mp3” was created. I embedded the hello.mp3 audio file at the beginning of this article. If you use a different Linux distribution than Fedora, then search if gtts and python3-gtts are available for your distribution.

USB Audio Interface

Two of the workstations here at Fedorum.com have dedicated USB audio interfaces. For text to speech work, I use the Steinberg UR 22 mkII audio interface which works perfect with Fedora Linux.

Audacity for audio editing

Audacity for editing text to speech mp3 files
Audacity showing the mono audio sample for “welcome to fedorum.com”

If you use Fedora and installed the two packages I mentioned earlier, then you can create your own text-to-speech files. To adjust the speed of the spoken phrase, I use an audio editing program. Linux has several audio editing programs but for small jobs, like working with TTS files, Audacity is fine.
To install audacity, type: sudo dnf install audacity into the Gnome terminal.

For more information about gtts and audacity, you can type the following commands into the terminal:

  1. dnf info gtts
  2. dnf info audacity

If you need professional features then you can check out Reaper which is an easy-to-use digital audio workstation. If you have experience working with a DAW then Reaper is definitely worth exploring.

Make TTS sound more natural by adjusting the pitch and tempo

Audacity has tons of built-in effects. To change the speed or pitch of an text-to-speech recording, use:
Effect > Change Pitch or Tempo. Always use the Preview button to listen before finalizing the change.

Increasing the pitch is useful when the voice needs to sound more like a child.
Lowering the pitch makes the speaker sound more natural. I use a setting of -8 to -12 (Audacity > Effect > Change Pitch).

As always, less is more. Take your time and listen as you experiment with the different values. The sweet spot of lowering the female pitch is around the -10 mark.
The same is true for changing the tempo. Speeding up a recording slightly, makes it sound more natural.

TIP! To trick TTS into saying “fedorum.com”, I spelled the whole sentence like so: “fedorum dot com”. I always improvise with the spelling and placing of comas and exclamation marks. Lastly, if the TTS can not pronounce a certain word, I often use an alternate one. It is better to get a clear and natural sounding voice then forcing the AI to say something that sounds off.

Audacity Macro Manager
If you create a lot of TTS files then you should take a moment and familiarize yourself with the Macro Manager that comes with Audacity. Creating macros to do repetitive tasks will give you more time to create.

For veteran and/or experienced Linux users

For experienced Linux users, ffmpeg is another option to quickly do what Audacity can do. Here is a quick refresher. If you look at the mp3 sample (top of this post) then you see that it is 3 seconds long (note the 0.03 next to the play button). To use ffmpeg in the terminal, download this mp3 file and use:
ffplay -autoexit -t ‘3’ linux_best_tts_female_voice.mp3
The above command will play the file right in the Gnome terminal. the crucial value is -t ‘3’ which specifies the 3 second audio file length. When playback reaches 3 seconds, ffmpg will automatically exit.

Use ffmpeg to speed up mp3
ffmpeg -i “test4b.mp3” -vf “setpts=(PTS-STARTPTS)/1.5” -af atempo=1.5 “test4c.mp3”
The above command will speed up the recording by 1.5 times. A value of less than 1 will have the opposite effect. Note that there are two occurrences of this value.

Use ffmpeg to change the tempo
ffmpeg -i test.mp3 -af asetrate=44100*0.5,aresample=44100,atempo=1/0.5 test_1.mp3

The crucial value is the 44100*0.5 which determines the tempo. Don’t forget to adjust both occurrences and feel free to experiment with other values than 0.5.

As always, && is your friend should you need to combine the two commands.
For most people, Audacity is much easier to use but if time is money, then ffmpeg is a good option. Fedora users can install ffmpeg by issuing this command:
sudo dnf install ffmpeg-free

Final thoughts

Right now, we are working on a software program called “Virtual Music Teacher” and having TTS available on Linux is a real time saver. Speaking of time, Linux has come a long way. 20 years ago, audio work was mostly done on Macs but now, anything can be done with the powerful GNU/Linux operating system. If you want to find out more about the increasingly popular Fedora distribution then check out why we have picked it for our daily driver.

Finally, here is the closing sentence spoken out loud. I have edited the pitch and tempo of the mp3 as explained above. The audio sample at the top of this blog post and the one below feature the same speaker. The only difference is in the tweaking of pitch and speed.

I couldn’t have said it better. Thank you for your time. 🙂
Can you share this article?

Leave a Reply

Your email address will not be published. Required fields are marked *