Marcel Timm, RhinoDevel, 2025
mt_tts is a C++ library for Linux and Windows that offers a pure C interface to the awesome text-to-speech system called Piper by Michael Hansen.
mt_tts supports:
- Text to wave file.
- Text to raw audio samples.
- Text to raw audio stream-to-stream conversion.
After cloning the mt_tts repository, enter its folder and proceed as follows.
Get the Piper submodule content:
git submodule update --init --recursive
If you want to build Piper in debug instead
of release mode, you need to modify the following in the file CMakeLists.txt
:
fmt
and spdlog
under target_link_libraries(
(two times) into fmtd
and
spdlogd
.
No details for Linux here, yet, but you can take a look at the Windows instructions below and at the Makefile.
Build Piper
- Open Visual Studio developer commandline.
- Go to the Piper submodule folder:
cd mt_tts\piper
- Create build folder:
mkdir build
- Enter build folder:
cd build
- Prepare the build:
cmake ..
- Start the build (this will also download stuff Piper needs from the internet):
cmake --build . --config Release
(orcmake --build .
for debug mode)
Test Piper (without mt_tts)
Copy the following (from different output directories in the build folder) into a new folder:
build\pi\share\espeak-ng-data
(the whole folder)build\pi\bin\espeak-ng.dll
build\pi\bin\piper_phonemize.dll
build\pi\lib\onnxruntime.dll
build\Release\piper.exe
(orbuild\Debug\piper.exe
)
Download a voice and its configuration, e.g. one for speech output in German language by Thorsten Müller:
Store these files in the same, new folder.
Create a WAV file:
echo "Ich bin ein Mensch, Du auch?" | piper --model de_DE-thorsten-high.onnx --config de_DE-thorsten-high.onnx.json
Output directly to speakers with ffmpeg (ffmpeg parameters may not be optimal, in this example):
echo "Hallo, ich bin kein Mensch, was man auch einigermaßen leicht heraushören kann, meinst Du nicht auch? Trotzdem ein tolles TTS-System!" | piper --model de_DE-thorsten-high.onnx --config de_DE-thorsten-high.onnx.json --output_raw | ffplay.exe -f s16le -ar 22050 -
- Open solution
mt_tts.sln
with Visual Studio (tested with 2022). - Compile in release or debug mode.
- Get the DLL and LIB files resulting from the build, e.g. for release mode
x64\Release\mt_tts.dll
andx64\Release\mt_tts.lib
, copy them to a new folder. - Also copy the file
mt_tts\mt_tts.h
to that new folder. - Copy the following stuff from Piper to the
new folder, too:
build\pi\share\espeak-ng-data
(the whole folder)build\pi\bin\espeak-ng.dll
build\pi\bin\piper_phonemize.dll
build\pi\lib\onnxruntime.dll
- Also copy a voice model file and its configuration file to the same new folder.
- Open
x64 Native Tools Command Prompt for VS 2022
commandline. - Got to the new folder and create a file
main.c
with the following code:
#include "mt_tts.h"
int main()
{
mt_tts_reinit("de_DE-thorsten-high.onnx", "de_DE-thorsten-high.onnx.json");
mt_tts_to_wav_file(
"Hallo, nun testen wir dieses kleine Hilfsmodul.", "output.wav");
mt_tts_deinit();
return 0;
}
- Compile via
cl main.c mt_tts.lib
. - Run
main.exe
, which will create the WAV fileoutput.wav
.