Digitized speech on the PICO-8?

pingo • 2019-02-18 2019-02-18 14:26 •

BBS>

PICO-8>Chat

Remember SAM? (https://simulationcorner.net/index.php?page=sam).

Would it be possible in some way to use the white-noise for samples and/or digi-speech?

I'm not talking about a full-blown conversion of SAM (https://github.com/s-macke/SAM), but maybe a smaller version. Also - shouldn't it be possible to use at least 3-bit samples on the PICO-8? (https://gist.github.com/munshkr/30f35e39905e63876ff7)

I'm not the person to dive into this, but maybe someone else might be able to do some sort of conversion of the above. Personally, I would really enjoy being able to have short/small samples and make the Pico speak!

/ Pingo

freds72 • 2019-02-19 2019-02-19 05:38

Gruber tried a while back but didn’t anything beyond a half ‘hello’ :/
Pico sound is very limited compared to a c64 or the like.

musurca • 2019-02-19*2019-02-19 07:58*

I was tinkering with this about a year ago—but instead of using samples, I was trying to do formant synthesis to generate a voice on the fly (sort of like a Speak 'n Spell). It was never amazing, but this is as far as I got:

Formant Synthesis Demo

by musurca

Cart #spicospell_demo-2 | 2019-02-19 | Code ▽ | Embed ▽ | License: CC4-BY-NC-SA

The idea is that you break down the speech into phonemes, then build up the sound by generating formants, consonants, and glottal sounds (note that I'm cheating on the last one)—

--[[
 "'a'a'aiii  ^a^a^a^ann ^a^a  
   p yyy  koooo   ueeee p"
]]--
make_formant("'a",3)
make_formant("i",3)
make_breath(2)
make_formant("^a",4)
make_formant("n",2)
make_breath(2)
make_formant("^a",2)
make_breath(2)
make_glottal("p")
make_breath()
make_formant("y",3)
make_breath(2)
make_glottal("k")
make_formant("o",4)
make_breath(3)
make_formant("u")
make_formant("e",4)
make_breath()
make_glottal("p")

pingo • 2019-02-19 2019-02-19 13:09

Thanks, both of you, on this.

freds72: Even though the Pico doesn't have the filter and other cool stuff the 64 has - if I remember correctly from reading the above (which was months ago, so forgive me if I'm wrong here), I assume that the Pico should be able to play 3-bit samples at least. Correct?

musurca: That is a really cool program there! Impressive! It only needs tweaking. :)

Did you look at the SAM-code for this, or is this your own?

This is really two different questions:

Digitlized samples, "digis" and
Phonemes, generated internally.

I enjoy both approaches. Maybe the 2nd one is easier to do, but it would be very interesting to see if the Pico can handle a few digis as well.

/ Pingo

electricgryphon • 2019-02-19 2019-02-19 16:22

Masurca,

That's pretty amazing! Did you find any good references while you were putting this demo together?

Thanks!

musurca • 2019-02-20*2019-02-20 05:48*

pingo— it's my own code and written in near-total ignorance, although I'd be curious to read the SAM source to see how it should be done properly. But generating sounds on the PICO-8 is a little trickier because you don't have low-level control over the speaker as you would on the C64—you just have a programmable tracker that can play 64 different musical notes from C(2) to Eb(7). In this demo I'm just playing the notes that are closest to the frequencies of the formants on different channels at the same time.

To play digitized samples on the Pico-8, I'd probably try a similar procedure: perform some kind of Fourier analysis on the audio at a very coarse sample rate (<30Hz), output the 4 strongest frequencies between 65 and 2489Hz for each sample, then write some code to quantize that data into musical notes for the tracker.

electricgryphon— some links for ya:

https://en.wikipedia.org/wiki/Formant (general explanation, first two formant frequencies for most vowels)
http://pages.mtu.edu/~suits/notefreqs.html (frequency of musical notes)
http://www.asel.udel.edu/speech/tutorials/synthesis/cons1.html (consonants)
http://www.lel.ed.ac.uk/~jkirby/hanoi/slides/lecture15-hanoi-4up.pdf (nasals)
http://www.asel.udel.edu/speech/tutorials/synthesis/vowels.html (web-based formant synth)

EDIT: ... and this audio recording from the 40s, in which "speech synthesis" is performed along the same principles with a harmonica. Our brains are so accustomed to isolating speech that we'll hear it in anything, if the tones are within the rough frequency ranges of formants, and if we're properly primed (with subtitle text, for example).

pingo • 2019-02-24 2019-02-24 16:59

Thanks musurca for the info and those links!

I would really like to see how you'd tackle samples. Would be interesting to see if it would be possible to play some sort of digi-sounds on the Pico 8.

/ Pingo

dddaaannn • 2019-02-24 2019-02-24 20:16

See also the talking piano experiment, which benefits from speed and polyphony that PICO-8 doesn't have, and still is barely intelligible: https://www.youtube.com/watch?v=muCPjK4nGY4

dothtm • 2019-03-26*2019-03-26 19:52*

I just became aware of this 1939 example:

https://www.youtube.com/watch?v=0rAyrmm7vv0

One Javascript implementation: http://griffinmoe.com/voder/

pingo • 2019-03-26 2019-03-26 20:40

dothm: Thank you for that one! Pretty awesome. A bit difficult to create sentences with that "piano", but I get the general idea. Would love to create that on the Pico-8!

/ Pingo

Roysterini • 2019-03-27 2019-03-27 18:08

That's wonderful! It'd probably be tricky to understand dialogue, but it would definitely give the sense of dialogue with an npc/droid for example.

pingo • 2019-03-27*2019-03-27 18:25*

Roysterini: Exactly. So lets hope the real shop-experience now will be added to PicoBreed! :)

/ Pingo

Felice • 2019-04-15*2019-04-15 00:05*

@musurca

(Preface: I'm a dabbler in audio at best. My understanding is pretty superficial in most cases and I may not understand correctly in many of them.)

I've been wondering if it's possible to choose sine-wave notes that are power-of-two multiples in frequency, and then maybe use a DCT to decompose a waveform into its coefficients at 120Hz, then choose the four largest coefficients and set the volume and pitch of the four channels accordingly 120 times a second, which appears to be the maximum rate at which you can make changes.

I'm not sure if the frequencies necessary to build up a waveform from coefficients DCT-style even exist in the PICO-8 waveform generator, and I'm betting 4 voices with a volume resolution of 3 bits each would make it sound awful even if so, but hey, this is what I had in my head from numerous nights of staring at the ceiling and then today seeing your post on the subject. ;)

Edit: oh, and I guess you'd need notes whose frequencies are multiples of 120 as well, or you'd get popping, hmm.

IMLXH • 2019-04-15 2019-04-15 20:03

https://www.lexaloffle.com/bbs/?tid=2513

My time to shine, eh?

Yeah the P8's sound "chip" is way too limited to generate speech in its limits. IIRC SAM was done by poking the audio output directly somehow?

Felice • 2019-04-15 2019-04-15 20:10

Yeah, and I think SAM did it so fast that it had to turn the screen off so it'd get to use every cycle available. It was definitely updating SID at more than 120Hz.

musurca • 2019-04-15*2019-04-15 21:35*

@Felice: interesting idea. If it's helpful as you work through it, I've put up a Git repository of my first stab at doing arbitrary digitized audio using Fourier analysis: https://github.com/musurca/pico8_fftsampler

(I can get it to work with simple tones, and it will mimic the rhythm and overall tone of more complex waveforms, but mostly it just sounds like very, very, very compressed audio -- which it is.)

@IMLXH: see the Formant Synthesis Demo above, and discussion that follows. In short, it is possible to generate speech with Pico-8 by means of the "talking piano" method, although clarity of articulation is an issue.

dw817 • 2019-10-31*2019-10-31 03:32*

Wholly Smokes, that is VERY impressive, @musurca ! You get my star.

Must've taken you ages to figure this out. And yeah, Pico-8's own soundboard leaves a lot to be desired.

Here's hoping @zep will increase the sound standard until it's at least as good as Atari 2600 for the cost of SFX storage.

So double the resolution at double the storage space. Do we really need 63-SFX for every cart ?

Cut it in half and give us double the pitches, extra low for white noise (6) so we can have true space engine or rumbling and high-pitch for warble birds, or in this case, higher quality speech.

IMLXH • 2019-12-13 2019-12-13 23:36

@musurca I was attempting formant synthesis with my own program, actually. I did it a while ago so I don't know whether I was looking at a vowel table or analyzing recorded audio, but essentially the tone frequencies I used were equal temperament approximations of formants. I found it interesting that the end result turned out so musical.

Also, I LOVE SAM. Which...you might have known if you ever YouTube searched my username. :P

dw817 • 2019-12-14*2019-12-14 00:23*

One of the games our team was planning, I wanted to have voice synthesis in it. And I think it's a lot easier than some may make it out to be. Apple ][ did it nicely for instance and CPU wise it's a lot worse off than Pico-8.

It won't sound good but because of the harsh limitations of changing pitch, but it'll sound well enough that someone can definitely make out the consonants and vowels and what is spoken.

The key I see is not so much the sound itself but the TIMING of the connected phonemes. The way they blend together. It has to have timing accurate to within a 30th of a second from each other.

Experimentation is key. I want to have it in the game though so no separate library function on this, not yet.

So far the previous examples I have seen are nothing short of brilliant.

wallgraffiti • 2021-03-20 2021-03-20 17:53

I've read this, and was incredibly interested.
THEN I remembered reading about Toki Pona, a language with a lot less possible sounds than English, and much simpler pronounciation, so at the moment I'm trying to make every consonant/vowel possible. Wish me good luck!

vanni • 2021-03-20 2021-03-20 22:49

Impossible Mission R.T. cart:
https://www.lexaloffle.com/bbs/?pid=88937

wallgraffiti • 2021-03-21 2021-03-21 10:35

I actually found that post right before checking back on my comment in this thread! It's amazing and will surely aid the next step towards speech synthesis in PICO-8. I'm excited!

wallgraffiti • 2021-03-21 2021-03-21 10:46

Alright, update on the Toki Pona synth. I'm trying to make a system where every syllable is a single pattern in the music editor, and you start with a vowel, onto which you can add up to two "hitters", such as the strong sound at the beginning of a "p" or "t", and that's going well. Only problem is I can't seem to get the "n" sound going for some reason. I'll keep updating.

vanni • 2021-10-07 2021-10-07 13:37

Relevant:
Sample Sandbox https://www.lexaloffle.com/bbs/?tid=42809

[Please log in to post a comment]

About | Contact | Updates | Terms of Use | Picotron

Follow Lexaloffle:

Generated 2025-09-18 07:13:37 | 0.023s | Q:49

User:
Password: