I've been messing around with software synth via the PCM trick and I feel like there's sort of a fundamental problem that's detracting a lot from the quality of what comes out the back end, needlessly so, not to mention making it uncomfortable to the listener's ear.
As far as I can tell, you're feeding the 5512.5Hz signal into the audio version of a "nearest neighbor" sampler to upsample it to the host OS' audio hardware sample rate, which is typically gonna be either 44100Hz or 48000Hz. That means repeating the sample values as many times as necessary to fill the gaps. In this case it'd repeat the same value around 8 times. This of course produces stairsteps on the 44100Hz+ "curve".
The issue with that is that modern audio hardware has really good fidelity and it's going to reproduce those sharp stairsteps very faithfully, effectively overlaying a constant 5512.5Hz triangle wave on the original wave that pulsates its amplitude based on how big the deltas between samples are in the original wave.
Like, if the original wave was a 262Hz middle-C sine wave tone, you'd get a 5512.5Hz buzz overlaid that gets strongest each time the wave crosses 0, where the rate of change is maximal. It crosses 0 twice per cycle, so you're getting an audible crackling buzz at 524Hz.
That's not what would have happened on old analog audio output circuitry. There would be ramping time between levels, some sort of acceleration, deceleration, and maybe overshoot too, as the line settled at its new level. Basically the audio version of an LCD pixel's "response time" or "gray-to-gray time". The overlaid triangle wave would be closer to a sine wave, more of a tone than a buzz, still present and still distorting the intended wave a bit, but much less harsh to the ear.
I dunno what options you have to change the upsampling. If you're doing it by hand, your options are open, but if you're relying on hardware, you might need to experiment. You probably don't want to change it over to using full interpolation, because that'll make it hard for people to do deliberate buzzing.
If it were me, I'd try doing a manual nearest-neighbor upsample to 22100Hz (edit: or maybe just 11050Hz) to retain the low-resolution stairsteps you'd expect the low clock rates to produce on retro hardware, but I'd let the host OS/hardware do interpolation to upsample it the rest of the way to the true output rate of 44100Hz or 48000Hz, which will simulate the ramping up/down time. This'd take the edge off, for the sake of the listener's ear.
Agreed, the PCM is very harsh right now. I know it's probably not intended to replace the Pico-8 audio system entirely, and might be getting misused by overeager people, but it's headache-inducing right now. A little smoothing would definitely help, and your suggestion at the end seems like a good compromise solution.
I agree that there are major audio quality issues that are easy to stumble into when working with PCM in Pico-8, but I’m pretty convinced that the nearest neighbor upsampling is not the primary cause. It rarely hurts the sound very much, and it can even be somewhat desirable, depending on the use case.
(Side note: I know that I’ve found myself disabling the output smoothing in some cases because, while it is more correct, it produces an unnaturally muffled distribution of output frequencies, and having more high end noise sounds better. If you're generating a lot of low-to-mid-frequency tones, while not having anything that would naturally generate a lot of high end, then it can help to smooth off the highs a bit. But if you're trying to synthesize things that would naturally have high end, like cymbals or hats, then you don't want this extra rolloff, since the aliasing induced by naive upsampling helps fill in frequencies that would be entirely missing otherwise.)
The bigger issue, I believe, is not antialiasing synthesized audio: not antialiasing samples properly, or using naive (aliasing) oscillators. The Nyquist frequency for Pico-8 PCM is about 2.8kHz, right in the middle of peak human hearing sensitivity, and typically the loudest aliased frequencies are also the lowest. If you do nothing to suppress those just-above-Nyquist frequencies, they will fold back to just under Nyquist and will show up as extremely prominent inharmonic tones in the final audio. Very unpleasant!
To be clear, I don’t blame cart authors for ignoring antialiasing. It’s not obvious that you need to do it, doing it well can be fiddly and/or CPU intensive, and it’s much much less of an issue at higher sample rates, so prior experiments with DSP may not provide good intuition.
I’m not sure if there’s anything Zep can do here other than raise the sample rate of the PCM output, which would help move the loudest aliasing frequencies up into a less perceptually prominent frequency range. Any other mitigations are most likely on cart authors' plates.
tl;dr: if your PCM cart sounds bad (as opposed to just kind of muffled or noisy), aliasing is probably why.
Here's a quick demo if you want to check out the effects of aliasing and the built-in upsampling filter:
This cart plays a short melody using a square wave. It takes three different approaches to antialiasing:
- None. Ignore it, always sample the current value of the waveform. This is what you get if you straightforwardly generate a square wave.
- Linear. Treat the square wave discontinuities as linear ramps 1 sample period long.
- polyBLEP. This approach treats the discontinuities as each being made from two spliced-together parabola segments, each segment being one sample period long.
There are also more involved and much more CPU-intensive methods, like additive synthesis, BLIT, minBLEP, etc., but polyBLEP generally works well enough for this case.
To my ear, the difference between the built-in upsampling filter being on vs. off is more or less a question of taste: depending on the use case, I might go one way or the other. The 2-point moving average filter used for this purpose is really quite gentle for the most part, with the half-power cutoff at about 5.5kHz, well into the range of frequencies that are only present as upsampling artifacts. So I think it's best regarded as a tool for shaping the tone of the output audio, rather than a noise suppressor.
On the other hand, the difference between the naive oscillator and having any antialiasing at all is night and day. (Again, to my ear.) Linear to poly is a noticeable improvement but nowhere near as important as the transition from nothing to linear.
I think my recommendations for cart authors would be:
- If you're playing back samples, make sure they are downsampled with an appropriate antialiasing filter to remove higher frequencies. Most tools should do this for you, but if you're rolling your own downsampling (by, for example, taking one of every N samples) you might run into problems here. If you need to play back samples at varying sample rates, make sure that you apply some form of interpolation or filtering to remove or attenuate frequencies higher than what Pico-8 supports. If you're not sure where to start, linear interpolation is an okay choice, but it will start to suffer dramatically after half an octave or so of detune.
- If you're generating synthetic waveforms, make sure to smooth out any discontinuities. Feel free to steal the code from this demo cart if you want! But basically: any time your waveform makes a jump, you need to smooth that jump out. There are more and less principled ways to do this, but anything you do will be better than doing nothing. If you don't want to think about where your discontinuities are or how to smooth them out, you can just oversample: generate samples at 2x or 4x Pico-8's native sample rate (or even more!) and apply some sort of moving average to the results. The exact form of moving average does matter, if you want to get into the theory, but again, anything is better than nothing.
- If you're doing time-varying or nonlinear effects, like distortion or modulated delays ... consider oversampling. It may not always make a big difference but it's often a good idea to check.
Absolutely agreed that people need to be aware of aliasing artifacts in their sample data, especially if the sample data isn't being provided at 5512.5Hz. This is always an issue in audio engineering, but much more evident when the resolution is so low.
My concern in the OP was more to do with the fact that a real 5512.5Hz DAC wouldn't (and probably couldn't) hold the analog output value perfectly level on the oscilloscope during each 1/5512.5s interval, but because the clock rate of a modern DAC is usually very high in comparison, it's technically capable of coming very close to that perfectly-level interval if you don't manually simulate the behavior of a DAC running at a lower clock frequency. I think zep's subsequent changes have done a decent job of doing so.
TL;DR: You're talking about cleaning up the digital samples going into the virtual DAC, while I was talking about the "analog" waveform coming out of it.
Huh, I just thought of something:
This is actually the same problem we see with perfectly square pixels in "retro" games, where the analog output of a cathode ray splattering electrons on the inside of a tube would smooth the pixel borders, but now we have such precise, high-resolution hardware that they're shockingly, unpleasantly square.
I guess the main point I was trying to make is that the exteme harshness that people seem to perceive in PCM output is much more likely to be due to out-of-control aliasing than failure to roll off frequencies above 2.8kHz after upsampling. So if someone's finding their cart's PCM output too harsh, the resolution is probably within their control.
On the other hand, if you're synthesizing moderately-complex audio and want a natural-looking output spectrum (so things don't sound too muffled), I've found it's generally best to turn off Pico-8's LPF so that quantization noise can help round out the high end. The spectrum will still typically fall off faster than an actual higher-sample-rate recording, but it helps.
(Turning off the LPF is perhaps less faithful in some sense to the concept of a 5.5kHz DAC, but if you want to get really faithful and drop frequencies above ~2.8kHz, the results sound awful. Very muffled sound, strong perceptual resonance at the cutoff ... turns out you really do need some amount of quantization noise here to keep things sounding acceptable. This is a bit like the other direction of your pixel analogy - "correct" upsampling of pixel art generally looks like a blurry mess.)
[Please log in to post a comment]