The Linux Text-To-Speech mini-HOWTO Author: Rhandeev Singh Version 0.02, Last update: 99/05/14 This document describes how to set up text-to-speech facilities on a RedHat 6.0 system using Festival and MBROLA. ================ 1. Introduction ---------------- This is a description of what I did to get my sound-enabled Linux system to read various documents to me without resorting to commercial software. I've tried various free Text-To-Speech (TTS) systems, and so far, Festival with MBROLA gives the best TTS quality I've heard. I've also tweaked some of the scheme files a bit so I can define different speeds/pitches and added support so lynx can read aloud a variety of formats (html, text, postscript). ========================== 2. Obtaining the software -------------------------- As of this writing, you can obtain the required software from the Blind Linux Archive or any mirror, e.g. ftp://leb.net/blinux/festival/cstr.mirror/Linux-1.3.1/\ redhat/RPMS/i386/glibc2 ftp://leb.net/blinux/mbrola/blinux/RPMS/ or from rpmfind.net: http://rpmfind.net/linux/RPM/Utilities_Speech.html http://rpmfind.net/linux/RPM/Utilities_Sound.html See the Blind Linux Home Page for more information: http://leb.net/blinux/index.html You will need the following RPMs for Red Hat 6.0 on i386: festival-1.3.1-2.i386.rpm festlex_POSLEX-1.3.1-1.noarch.rpm festvox_en1-1.3.1-1.noarch.rpm speech_tools-libs-1.1.1-2.i386.rpm festlex_OALD-1.3.1-1.noarch.rpm # for british english festlex_CMU-1.3.1-1.noarch.rpm # for american english You will also need mbrola, plus at least one voice database of your choice (british english is the most straightforward to set up): mbrola-3.01g1-1.i386.rpm mbrola-en1-2.0-1.i386.rpm # british english mbrola-us1-980512-1.i386.rpm # american (male) mbrola-us2-980812-1.i386.rpm # american (female) If you need tarballs, visit the Festival project at http://www.cstr.ed.ac.uk/projects/festival.html and the MBROLA project at http://tcts.fpms.ac.be/synthesis/mbrola.html ================ 3. Installation ---------------- Here, I ASSUME YOUR LINUX SOUND HARDWARE IS ALREADY SET UP. See the Linux Sound-HOWTO if it isn't. In particular, you'll need /dev/dsp to be functioning properly. Installing the RedHat RPMs is pretty straightforward: bash$ rpm -Uhv fest* mbrola* Next, create some symbolic links for british english: cd /usr/lib/festival/lib/voices/english/rab_mbrola ln -s /usr/lib/mbrola/en1/en1 ln -s /usr/lib/mbrola/en1/en1mrpa Alternatively, you could install other diphone databases for MBROLA by replacing "en1" above with the respective database name; however, I have never tried any of them, and besides, you'd need to get Festival to speak those languages first. As of this writing, MBROLA diphone databases were also available for French, German, Spanish, and a variety of other languages. See the Blind Linux Home Page. ============================= 4. Configuration and Testing ----------------------------- ------------------------- 4.1. Basic Configuration ------------------------- Insert a line in /usr/lib/festival/lib/voices.scm: (defvar default-voice-priority-list '(rab_mbrola ; <= INSERT THIS LINE rab_diphone ; (without the semicolons, ked_diphone ; of course) And you're done. This tells Festival to try to use MBROLA first. You can test your setup by running festival like this: bash$ festival festival> (SayText "Type any text here. Sounds cool?") festival> (quit) Festival can be configured to pre-process various file formats to make them more "readable", e.g. HTML. See the festival documentation in /usr/doc/festdoc-1.2.0/festival/festival.info for information on how to set this up. It's under the section "Text modes". As for me, I chose to write a good old sed script to do my text pre-processing instead. You can run festival in text-to-speech mode on a file: festival --tts or on a stream: lynx -dump | tee /tmp/x | festival --tts & \ sleep 1; less /tmp/x If you're satisfied and it works, you can stop here. But read on for some tweaks if you're curious :) ------------------------- 4.2. Voice Style Support ------------------------- This section shows you how to define and select pitch and speed presets. Dump appendix A.1 into the new file /usr/lib/festival/lib/siteinit.scm to define the necessary scheme procedures for handling voice "styles". Put Appendix A.2 into /etc/festival.conf to define some system-wide styles. Next, insert a line near the end of /usr/lib/festival/lib/init.scm: ;;; Default voice (have to do something cute ;;; so autoloads still work) (eval (list voice_default)) (Style style_default) ; <= INSERT THIS LINE (provide 'init) You can put user-defined "styles", including a default, into $HOME/.festivalrc like this: ;; User-defined styles (NewStyle 'my_slow 100 24 1.0) ; A slow baritone voice (NewStyle 'my_fast 140 50 0.8) ; A fast tenor voice (set! style_default 'my_slow) ; My default style The first number is the mean pitch (Hz), the second is the pitch std deviation (or is it variance?) and the third is the scaling factor, smaller=faster. ------------------------------------- 4.3. Changing Styles Inside A Script ------------------------------------- If you're running festival on a script in batch mode, you can change the current "style" within the script like this: (Style 'male_faster) As an example, see the script in Appendix A.3. that adds style selection support to festival's command-line TTS interface. If you modify it to add other functionality, do let me know! --------------------------------- 4.4. Selecting The Output Method --------------------------------- Some people prefer to use NCD's Network Audio System (NAS) so that they can play MP3's through their sound hardware and use festival at the same time. To do this, insert the following into /etc/festival.conf: ;;; =================== ;;; Audio Output Method ;;; =================== (Parameter.set 'Audio_Method 'Audio_Command) (Parameter.set 'Audio_Required_Rate 16000) (Parameter.set 'Audio_Required_Format 'snd) (Parameter.set 'Audio_Command "/usr/X11R6/bin/auplay -volume 100 $FILE") --------------------------------------------- 4.5. Turning Off Garbage Collection Messages --------------------------------------------- It is useful to turn off garbage-collection messages if you're using an interactive setup, like making lynx read web pages to you (you can do this in lynx.cfg). To disable garbage collection messages, insert this in either /etc/festival.conf or $USER/.festivalrc: (gc-status nil) ------------------------------- 4.6. Queries And Contributions ------------------------------- If you're facing trouble getting festival to work, or have anything to contribute to this mini-HOWTO, you can contact the author of this mini-HOWTO: Rhandeev Singh http://www.comp.nus.edu.sg/~rhandeev Linux User Group http://linux.comp.nus.edu.sg National University of Singapore ====================================================================== APPENDICES ====================================================================== ===================================================== A.1. Template for /usr/lib/festival/lib/siteinit.scm ----------------------------------------------------- (WARNING: Scheme code follows; you don't have to understand this) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; Site Initialisation file ;;; This is loaded near the end of init.scm, ;;; just before user initialisation file ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ========================== ;;; Style Management Functions ;;; ========================== (defvar Styles '((default 140 22 1.0)) "Available voice styles") (defvar style_default 'default "Default voice style") (defvar current_style 'none "Current voice style") (define (Style selected_style) "(Style DEFINED_STYLE) Sets the pitch, pitch variance, and speed of the current voice. Type 'Styles' for a list of defined styles." (let ((style (assoc selected_style Styles))) (if (not style) (set! style (assoc 'default Styles))) (let ((model_mean (cadr (assoc 'model_f0_mean int_lr_params))) (model_std (cadr (assoc 'model_f0_std int_lr_params))) (new_mean (cadr style)) (new_std (cadr (cdr style))) (new_stretch (cadr (cdr (cdr style))))) (set! int_lr_params (list (list 'target_f0_mean new_mean) (list 'target_f0_std new_std) (list 'model_f0_mean model_mean) (list 'model_f0_std model_std))) (Parameter.set 'Duration_Stretch new_stretch) (set! current_style (car style)) (list (car style) new_mean new_std new_stretch) ) ) ) (define (NewStyle style_name mean std stretch) "(NewStyle STYLE_NAME MEAN STD STRETCH) Defines a new style; MEAN and STD refer to pitch mean and variance, while STRETCH refers to inverse speed, 1.0 being the standard." (set! Styles (cons (list style_name mean std stretch) Styles))) (if (probe_file "/etc/festival.conf") (load "/etc/festival.conf")) ===================================== A.1. Template for /etc/festival.conf ------------------------------------- ;;; ================= ;;; Style Definitions ;;; ================= (NewStyle 'male_frozen 80 10 1.5 ) (NewStyle 'male_slow 100 22 1.1 ) (NewStyle 'male_tenor 140 60 1.0 ) (NewStyle 'male_baritone 100 40 1.0 ) (NewStyle 'male_bass 70 25 1.0 ) (NewStyle 'male_relaxed 100 24 0.95) (NewStyle 'male_newscaster 140 32 0.85) (NewStyle 'male_hurried 117 22 0.80) (NewStyle 'male_stressed 150 30 0.70) (NewStyle 'male_fast 110 22 0.70) (NewStyle 'male_faster 110 22 0.60) (NewStyle 'male_panic 170 20 0.60) (NewStyle 'male_fastest 110 22 0.55) (NewStyle 'the_flash 110 22 0.45) =============================== A.3. Script with Style Support ------------------------------- Here's a script that adds voice "style" support to the festival text-to-speech interface. You can use it as a template to add further functionality. Do let me know if you do! Here's very briefly how to use it, assuming it's named "saytext": saytext -h # gives a (cryptic) help message saytext [-s