How to setup TTS on Arch Linux

By default, Arch Linux doesn’t come with a Text-to-Speech (TTS) engine. Furthermore, there is no speech dispatcher, so in browsers you will see empty list of voices:

speechSynthesis.getVoices() // []

The most known TTS engine is espeak-ng. But it also quite old and sound like a robot.

You can try flite, festival or piper, and the last one is the best choice.

FeatureeSpeakFliteFestivalPiper
Voice QualityRobotic, syntheticBasic, roboticModerate, semi-naturalNatural, near-human
LanguagesOver 50Primarily EnglishLimited but extendableGrowing, high-quality
Resource UsageVery lowExtremely lowModerateHigher, needs decent hardware
CustomizationModerate (pitch, speed)LimitedHighLimited, neural training needed
Best ForAccessibility, low-resource devicesIoT, small embedded devicesResearch, customizable appsQuality-critical apps, voice assistants

Summary:

I assume that you need high sound quality and you have enough resources, so I will show you how to set up piper.

Let’s start from speech-dispatcher that is middleware between TTS engine and applications. It allows to connect browser to tts engine.

yay -S speech-dispatcher

Now we install piper engine

yay -S piper-tts-bin

and languages

yay -S piper-voices-en-us

To create configuration in interactive mode run

spd-conf

Now you can use spd-say command to test the setup

spd-say "Arch Linux is the best"

Other languages

Sometimes you need to tell something in other language. Piper offer many voices that you can test on official website, or on samples website.

Unfortunately in aur you will see only 6 packages:

I couldn’t find any script for this but figured out that if you will go to /usr/share/piper-voices you will see:

├── en
│   └── en_US
│       ├── amy
│       │   ├── low
│       │   │   ├── ALIASES
│       │   │   ├── MODEL_CARD
│       │   │   ├── en_US-amy-low.onnx
│       │   │   └── en_US-amy-low.onnx.json
│       │   └── medium
│       │       ├── MODEL_CARD
│       │       ├── en_US-amy-medium.onnx
│       │       └── en_US-amy-medium.onnx.json

that is quite similar structure to document

so I installed Polish language copying logic from aur packages PKGBUILD and published as:

https://aur.archlinux.org/packages/piper-voices-pl-pl

Now we can test:

echo "Cześć, to jest test języka polskiego." | piper-tts --model /usr/share/piper-voices/pl/pl_PL/darkman/medium/pl_PL-darkman-medium.onnx --output_file welcome.wav

and

mpv welcome.wav

It works perfectly.

Then I edited /etc/speech-dispatcher/speechd.conf and added

sudo nano /etc/speech-dispatcher/modules/piper-generic.conf

and added

AddVoice "pl" "MALE1"    "pl/pl_PL/darkman/medium/pl_PL-darkman-medium"

now

spd-say -L

display 2 languages but

spd-say -l pl "Powściągliwość"

sound like Powcilgliwo (skips ś, ą, ść) while

echo "Powściągliwość" | piper-tts --model /usr/share/piper-voices/pl/pl_PL/darkman/medium/pl_PL-darkman-medium.onnx --output_file /tmp/welcome.wav && aplay /tmp/welcome.wav 

is correct. It means that we’re loosing polish characters when dispatcher passing sentence to tts engine.

To fix it you have to add line

GenericLanguage		   "pl" "pl" "utf-8"

to /etc/speech-dispatcher/modules/piper-generic.conf.

For russian language I installed

echo "1111" | piper-tts --model /usr/share/piper-voices/ru/ru_RU/denis/medium/ru_RU-denis-medium.onnx --output_file /tmp/welcome.wav && aplay /tmp/welcome.wav

add

GenericLanguage   "ru" "ru_RU" "utf-8"
AddVoice "ru" "MALE1"    "ru/ru_RU/denis/medium/ru_RU-denis-medium"

to /etc/speech-dispatcher/modules/piper-generic.conf and your computer speak in russian

spd-say -l ru "111" 

I described how to install languages manually, but after it decided to release packages for languages:

You can check them on aur

And test by commands like:

spd-say -l es "Un arcoíri o arco iris es un fenómeno óptico"
echo "Un arcoíri o arco iris es un fenómeno óptico" | piper-tts --model /usr/share/piper-voices/es/es_ES/sharvard/medium/es_ES-sharvard-medium.onnx --output_file /tmp/welcome.wav && aplay /tmp/welcome.wav

Sources: