A Simple Script for Kokoro-82M Text-to-Speech

· cyclicircuit's blog


Recently I was involved in a discussion about text-to-speech and I advocated for this model I really like called Kokoro-82M which I use all the time in order to generate audio of articles. However, the model is only available as a python library and primarily meant to be run in the context of another script or a Jupyter Noteboook.

Thing is, it's not even that hard to make a simple wrapper script around the thing, especially when this project exists, is open-source, and we can copy their approach: https://github.com/santinic/audiblez/. Notably, this is designed for audiobooks, but there's no reason why we can't use this with any text.

You can see the script here: https://cyclicircuit.pastes.sh/kokoro-tts.py

Installation (Lazy) #

I am a big fan of uv, if you're not using it, you should. The first step to using this script is to install uv. After that, give it execution permissions and it should just work, the script uses uv in its "crunchbang" to basically install all its dependencies when its run. Its very convenient, but might be just a tad slow the first time.

1# after installing uv, download the script and make it executable:
2wget https://cyclicircuit.pastes.sh/raw/kokoro-tts.py
3chmod +x kokoto-tts.py
4./kokoto-tts.py --help

Usage #

All you have to do is just pipe in text, and you get audio out. The --help parameter is your friend, but this example should work:

1echo "Hello there, and who might you be?" | ./kokoro-tts.py --output test.mp3 -v bm_george

This will output the audio to a file called test.mp3 using the British "george" voice.

Advanced Usage #

Generate Audio for Articles #

OK so, we have our awesome script, and we wanna read some Wikipedia article, but we're just too lazy... What do?

UNIX to the rescue! We don't have to write any code, we can just glue some programs together to do what we want.

First, you will have to get an account with https://extractorapi.com. They have a free tier that gives you 1000 API calls per month, which is probably more than you need anyway. Once you get an account, you will be given an API token, which you can then use like this:

1curl 'https://extractorapi.com/api/v1/extractor/?apikey=<API-TOKEN>&url=<URL>' | jq .text

Obviously you will need to have curl and jq installed, and replace <API-TOKEN> with the token, and the <URL> with something useful, like https://en.wikipedia.org/wiki/Speech_synthesis.

Now that we know how to easily get text out of an online article, we can just pipe that into our script:

1curl 'https://extractorapi.com/api/v1/extractor/?apikey=<API-TOKEN>&url=https://en.wikipedia.org/wiki/Speech_synthesis' | jq .text | ./kokoro-tts.py --output speech-synthesis.mp3 -v bm_george

But what if we took the laziness one step further?

navi is one of my all-time favorite productivity tools in the terminal. If you haven't heard of it, I strongly recommend it. Here's a nice Navi cheat that takes an article URL, allows you to pick a voice, and generates the audio (note that you'll probably want to add the API token in there from extractor API, and you may want a consistent path to the kokoro_tts.py script):

% tts

# TTS: Generate Audio for text from URL
curl 'https://extractorapi.com/api/v1/extractor/?apikey=<API_TOKEN>&url=<url>' | jq .text | ./kokoro-tts.py --output <output_filename>.mp3 -v <voice>

$ voice: python3 -c 'print("\n".join(["af_alloy", "af_aoede", "af_bella", "af_heart", "af_jessica", "af_kore", "af_nicole", "af_nova", "af_river", "af_sarah", "af_sky", "am_adam", "am_echo", "am_eric", "am_fenrir", "am_liam", "am_michael", "am_onyx", "am_puck", "am_santa", "bf_alice", "bf_emma", "bf_isabella", "bf_lily", "bm_daniel", "bm_fable", "bm_george", "bm_lewis"]))'
$ output_filename: python3 "`navi info cheats-path`/media/tts_candidates.py" $url

You may notice that tts_candidates.py is a script of its own, all it does is just generate some generic candidates as well as try to get a nice filename from the URL:

 1#!/usr/bin/env python3
 2import sys
 3from urllib.parse import urlparse
 4
 5EXCLUDED = {'en', '', '/', '?'}
 6
 7for path_element in reversed(urlparse(sys.argv[1]).path.split('/')):
 8    if path_element not in EXCLUDED:
 9        print(path_element)
10        if '.' in path_element:
11            print('.'.join(path_element.split('.')[:-1]))
12        break
13
14print("output")
15print("article")
16print("tts_output")
17print("tts_article")