How to generate voice from text using fish-speech?

Text to Voice

Clone repo

[email protected]:fishaudio/fish-speech.git
cd fish-speech

Prepare the environment

conda create -n fish-speech python=3.10
conda activate fish-speech

Install cli to download models from huggingface hub

pip install -U "huggingface_hub[cli]"

Download fish-speech-1.4 model

You can check if version 1.4 is still relevant in docs

huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4/

Install dependencies in the root of the project

pip3 install -e .

Generate codes_0.npy file from text

python tools/llama/generate.py --text "Your text here" --checkpoint-path "checkpoints/fish-speech-1.4"

Generate audio from codes_0.npy file

python tools/vqgan/inference.py -i "codes_0.npy" --checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"

Install sox

yay -S sox

play audio

play fake.wav

Other tts tools

You can check festival, espeak, google text to speech, etc. for other text to speech tools.

Install festival

yay -S festival festival-english

and use it

echo "Hello world" | festival --tts

Install espeak

yay -S espeak

and use it

espeak "Hello world"

Install google text to speech

pip install gtts 

and use it

gtts-cli "Hello, this is a Google TTS test." --lang en --output output.mp3
mpv output.mp3