Remember SAM? (https://simulationcorner.net/index.php?page=sam).
Would it be possible in some way to use the white-noise for samples and/or digi-speech?
I'm not talking about a full-blown conversion of SAM (https://github.com/s-macke/SAM), but maybe a smaller version. Also - shouldn't it be possible to use at least 3-bit samples on the PICO-8? (https://gist.github.com/munshkr/30f35e39905e63876ff7)
I'm not the person to dive into this, but maybe someone else might be able to do some sort of conversion of the above. Personally, I would really enjoy being able to have short/small samples and make the Pico speak!
I was tinkering with this about a year ago—but instead of using samples, I was trying to do formant synthesis to generate a voice on the fly (sort of like a Speak 'n Spell). It was never amazing, but this is as far as I got:
The idea is that you break down the speech into phonemes, then build up the sound by generating formants, consonants, and glottal sounds (note that I'm cheating on the last one)—
--[[ "'a'a'aiii ^a^a^a^ann ^a^a p yyy koooo ueeee p" ]]-- make_formant("'a",3) make_formant("i",3) make_breath(2) make_formant("^a",4) make_formant("n",2) make_breath(2) make_formant("^a",2) make_breath(2) make_glottal("p") make_breath() make_formant("y",3) make_breath(2) make_glottal("k") make_formant("o",4) make_breath(3) make_formant("u") make_formant("e",4) make_breath() make_glottal("p")
Thanks, both of you, on this.
freds72: Even though the Pico doesn't have the filter and other cool stuff the 64 has - if I remember correctly from reading the above (which was months ago, so forgive me if I'm wrong here), I assume that the Pico should be able to play 3-bit samples at least. Correct?
musurca: That is a really cool program there! Impressive! It only needs tweaking. :)
Did you look at the SAM-code for this, or is this your own?
This is really two different questions:
- Digitlized samples, "digis" and
- Phonemes, generated internally.
I enjoy both approaches. Maybe the 2nd one is easier to do, but it would be very interesting to see if the Pico can handle a few digis as well.
pingo— it's my own code and written in near-total ignorance, although I'd be curious to read the SAM source to see how it should be done properly. But generating sounds on the PICO-8 is a little trickier because you don't have low-level control over the speaker as you would on the C64—you just have a programmable tracker that can play 64 different musical notes from C(2) to Eb(7). In this demo I'm just playing the notes that are closest to the frequencies of the formants on different channels at the same time.
To play digitized samples on the Pico-8, I'd probably try a similar procedure: perform some kind of Fourier analysis on the audio at a very coarse sample rate (<30Hz), output the 4 strongest frequencies between 65 and 2489Hz for each sample, then write some code to quantize that data into musical notes for the tracker.
electricgryphon— some links for ya:
https://en.wikipedia.org/wiki/Formant (general explanation, first two formant frequencies for most vowels)
http://pages.mtu.edu/~suits/notefreqs.html (frequency of musical notes)
http://www.asel.udel.edu/speech/tutorials/synthesis/vowels.html (web-based formant synth)
EDIT: ... and this audio recording from the 40s, in which "speech synthesis" is performed along the same principles with a harmonica. Our brains are so accustomed to isolating speech that we'll hear it in anything, if the tones are within the rough frequency ranges of formants, and if we're properly primed (with subtitle text, for example).
|To play digitized samples on the Pico-8, I'd probably try a similar procedure: perform some kind of Fourier analysis on the audio at a very coarse sample rate (<30Hz), output the 4 strongest frequencies between 65 and 2489Hz for each sample, then write some code to quantize that data into musical notes for the tracker.|
(Preface: I'm a dabbler in audio at best. My understanding is pretty superficial in most cases and I may not understand correctly in many of them.)
I've been wondering if it's possible to choose sine-wave notes that are power-of-two multiples in frequency, and then maybe use a DCT to decompose a waveform into its coefficients at 120Hz, then choose the four largest coefficients and set the volume and pitch of the four channels accordingly 120 times a second, which appears to be the maximum rate at which you can make changes.
I'm not sure if the frequencies necessary to build up a waveform from coefficients DCT-style even exist in the PICO-8 waveform generator, and I'm betting 4 voices with a volume resolution of 3 bits each would make it sound awful even if so, but hey, this is what I had in my head from numerous nights of staring at the ceiling and then today seeing your post on the subject. ;)
Edit: oh, and I guess you'd need notes whose frequencies are multiples of 120 as well, or you'd get popping, hmm.
@Felice: interesting idea. If it's helpful as you work through it, I've put up a Git repository of my first stab at doing arbitrary digitized audio using Fourier analysis: https://github.com/musurca/pico8_fftsampler
(I can get it to work with simple tones, and it will mimic the rhythm and overall tone of more complex waveforms, but mostly it just sounds like very, very, very compressed audio -- which it is.)
@IMLXH: see the Formant Synthesis Demo above, and discussion that follows. In short, it is possible to generate speech with Pico-8 by means of the "talking piano" method, although clarity of articulation is an issue.
Wholly Smokes, that is VERY impressive, @musurca ! You get my star.
Must've taken you ages to figure this out. And yeah, Pico-8's own soundboard leaves a lot to be desired.
Here's hoping @zep will increase the sound standard until it's at least as good as Atari 2600 for the cost of SFX storage.
So double the resolution at double the storage space. Do we really need 63-SFX for every cart ?
Cut it in half and give us double the pitches, extra low for white noise (6) so we can have true space engine or rumbling and high-pitch for warble birds, or in this case, higher quality speech.
@musurca I was attempting formant synthesis with my own program, actually. I did it a while ago so I don't know whether I was looking at a vowel table or analyzing recorded audio, but essentially the tone frequencies I used were equal temperament approximations of formants. I found it interesting that the end result turned out so musical.
Also, I LOVE SAM. Which...you might have known if you ever YouTube searched my username. :P
One of the games our team was planning, I wanted to have voice synthesis in it. And I think it's a lot easier than some may make it out to be. Apple ][ did it nicely for instance and CPU wise it's a lot worse off than Pico-8.
It won't sound good but because of the harsh limitations of changing pitch, but it'll sound well enough that someone can definitely make out the consonants and vowels and what is spoken.
The key I see is not so much the sound itself but the TIMING of the connected phonemes. The way they blend together. It has to have timing accurate to within a 30th of a second from each other.
Experimentation is key. I want to have it in the game though so no separate library function on this, not yet.
So far the previous examples I have seen are nothing short of brilliant.
[Please log in to post a comment]