Add speech synthesis to your games with Speako8, a speech synthesis library for PICO-8 in under a thousand tokens! It's loosely based on a Klatt synthesizer and will remind some folks of Software Automatic Mouth (S.A.M.)
To add Speako8 to your games, copy and paste the library below:
Speako8's voice must be configured prior to first use:
function _init() spk8_pitch,spk8_rate,spk8_volume,spk8_quality,spk8_intonation,spk8_if0,spk8_shift,spk8_bandwidth,spk8_whisper= 140,1,1,.5,10,10,1,1,1 end
Warning: This application may generate loud and harsh sounds. Protect your hearing! Do not test with headphones on and turn down the volume.
|spk8_pitch||60-230||Fundamental pitch of voice (F0) in hertz|
|spk8_rate||.1-2||Standard rate of speech divisor— below 1 is slower; above 1 is faster.|
|spk8_volume||.1-2||Standard volume factor— below 1 is quieter; above 1 is louder.|
|spk8_quality||.1-2||Glottis open period— below .5 is creakier voice; .5 is modal voice; 1 is breathy voice; above 1 is weaker voice.|
|spk8_intonation||0-20||Degree of pitch prosody in hertz— set to zero for robotic monotone.|
|spk8_if0||0-20||Degree to which the inherent pitch (F0) of vowels varies (in hertz)— set to zero for robotic monotone.|
|spk8_shift||.8-1.2||Factor by which to shift formant frequencies (F1, F2, F3)— above raises formant frequencies; below one lowers|
|spk8_bandwidth||.5-5||Factor by which to alter bandwidth of formant frequencies (F1, F2, F3)— below 1 narrows the bandwidth of the formants; above 1 widens. When formants are shifted upward, it is recommended to widen bandwidths as well.|
|spk8_whisper||1 or 2||Speaking mode— Normal voice is 1; whispering is 2.|
To enable speech synthesis, you must call
speako8 in the
_update function. Then call
say with a speech string:
function _update() speako8() -- must appear once unconditionally in _update. if btnp(5) then --❎ button -- use speaking if you want to check if anything is currently being said. -- if not speaking() then say("_/hh/-1.57/eh/-1.07/l/-1.33/3/ow/-1.03/-3/w/1.27/-3/er/1.65/-3/l/-1.64/-3/_/d") -- end elseif btnp(4) then -- 🅾️ button mute() --flushes the sound queue and immediately stops whatever is currently being said. end end
Speech strings represent text phonetically and include prosody markup. The easiest way to create and test them is with the Declare web app. You can also try out different voice options with it.
If you are interested in learning more about speech synthesis, try googling these key words: Klatt synthesizer, formant, acoustic phonetics. It's a fascinating topic. An unminimized version of the library is included in the demo cart.
Special thanks to the gang on the Discord server for lending me their ears and keeping me motivated, especially @packbat, who did it for science. Thanks also to @IMLXH for finding the Klatt prosody rules.
Damn! And a nice little democart. It`ll be fun to think about ways to use this. Maybe a song generator with vocals? A Spaceship-Announcement-System? Speaking NPCs in an RPG? Great work!
Haha this is AMAZING. The whispering was hilarious. :D
@dw817, I think it's possible to write phonetic and prosody rules to convert English text to speech strings and fit them into a cart, but I doubt very much that there would be many tokens left over for anything else.
@zep, yes, really looking forward to see what folks do with this. Glad you liked the name.
To everyone, thank you for your appreciations. They made my day.
Imagine using this in a video game. Maybe the creepy whispers would be perfect for a scary atmosphere...
kinda reminds me of Stephen hawking's tts voice (rip)
@idk_how_to_code, that's because it kinda is! Hawking spoke via a Klatt synthesizer and Speako8 is loosely based on that synthesizer. Much of the formant data and prosody rules I used came from Klatt's writings and are based on his own speech samples. So, yeah Speako8 sounds like Hawking sounds like Klatt.
It sounded like Currah μSpeech for the ZX Spectrum lol
this is very cool, but what if you could feed it raw english text and let it say stuff?
@augusto99999, it could be done. You'd have to write rules for phoneticizing. They wouldn't be as accurate as the pronouncing dictionary that Declare uses, but they might be okay enough. You'd also have to convert the prosody rules from declare.js. I'm afraid, though, that once those tasks were done, there wouldn't be many tokens left for an actual game.
gabriel is coming in your games ony for 1000 tokens of code
I've added this to my exploring game and finding the CPU/mem usage spikes crazy high when the talking happens, slowing everything else in the game down.
I guess it doesn't surprise me given the magic it's doing, but is there anything I can/should do to try and minimize the hit?
I'm just using the say() with a trigger like documented. It's an incredibly tight and well-thought out component - super thanks for that - but that puts the magic way above my head to try and tweak it :)
@morningtoast, As you've noticed, it tends to drop frames when the buffer is loading. You can improve things quite a bit, but not totally eliminate the issue by searching for
while stat(108)<1920 and changing it to
while stat(108)<256 in the Speako8 lib.
Whispering sounds like a true intruder moment
Reminds me of SAM, the software automatic mouth for the c64 iirc. It sounds pretty similar!
[Please log in to post a comment]