Weighting the output of FFT

Hi vvvvers

Can anyone suggest a way to weight the output of the FFT node so that it more accurately reflects our hearing’s perception of the frequency? Ie - a logarithmic curve giving more priority to the lower frequencies.


check: AudioAnalysis (DShow9) and SplitAudioAnalysis (Spreads) they separate the frequencies into 4 bands. also look into FFT (DShow9 4Channels) to see how you can select the samples by frequency.

Am I right in thinking though that the range of each bin coming from the fft node are mapped linearly to frequency rather than octaves (which would take into account the brain’s perception of pitch vs frequency)?

yes, FFT is always linear and there is nothing one can do about it. i have done some days of research on that at meso in frankfurt. there where some approaches like constant-Q transform, but at the end its always a rescaled FFT…

but is there a way using vvvv to resample (possibly?) that fft spread in a way that makes each octave have an equal number of slices?

would be nice to have a wavelet node

Back on this topic, I’m convinced there is a way to weight the fft output spread to make it more usable (for me at least!)

If you have the facility, try sweeping a sine tone up the audible frequency range, and monitor the loudest fft bin using a sort/CDR combo. You will immediately see how many of the fft bins are used up on the highest frequencies, and how few cover the lower frequencies

(this is relative to our perception of the frequency ranges of course, and I agree that mathematically the FFT is doing what it is supposed to)

For an explanation of what the heck I’m talking about, have a look at the definition of white noise vs pink noise - http://www.sweetwater.com/expert-center/techtips/d--08/23/2000

At the moment the fft outputs energy per frequency wheras I want energy per octave.

Is it not possible to do this (roughly at least) using some clever resampling of the fft spread? The lowest index slices need to be resampled to cover a greater number of output slices and the higher index slices a fewer number of output slices. The tricky bit is this needs to happen as a function of the frequency, but I suppose a manual method of resampling small groups of slices then combining them again may work to an extent, but the result will be quite coarse I imagine.

Another interesting thing to note when playing sweeping the sine tone through the fft is that even though the volume stays constant as the (single) frequency changes, the value output from the fft is much greater when the frequency is higher, than lower. I think this would also be fixed by solving the above, and I’m sure you can see how this would be useful…


here is some reading.




once i implemented all this, but when diggin into it i realized that you wont get more information out of the signal than the fft already has. just pack the fft bins together as the fft 4 channels does. you can do that with a + spectral with increasing bin sizes and a scaling proportional to the bin sizes after it.

again pushing my wavelet proposal … do you (tf/devs) see a chance that we get this? or, maybe you aren’t convinced that this will push things?

the reason for me wanting this was a talk to a mathematician doing signal transforms all the time. he told me that, the (fast) wavelet transform has several advantages to its fourier equivalent. i.e. that it’s more similar to the human hearing – what is a thing we’re talking about in here, isn’t it?

here’s a wavelet vs. fourier … for someone who understands the math behind it - cos i don’t.

yes, i also looked into that… my main concern that time was input delay, which can not be solved because to analyse a low frequency wave, you have to wait until its there, which takes it time and defines the buffer size.

anyways, the wavelet transformation is really an interesting alternative, will look into it next week…

perhaps i can provide some mathlab code, if this is useful for you?

yes, any code you have/find will be useful… i do not intend to write the algorithm from scratch, but find the best implementation for our tasks.

ok, did some tests… same conclusion as before. fft is fast and has all info. the wavelet transform has only 10 bands if the buffer size is 512, 11 at 1024 and so on… rescaling the FFT has almost the same results as the wavelet transform for our audio analysis tasks. wavelets would make more sense if we had very big buffers… but we dont want big buffers because of the latency.

here are some tests, it includes a RescaleFFT plugin, which does what mrboni was referring to.

wavelet.zip (88.2 kB)

Thanks Ton, will check it out.

Might a combination fft / wavelet approach work using the wavelet analysis only for frequencies above a threshold, avoiding large buffer sizes, and fft for the remaining low frequencies?

its not a frequency problem with wavelets, its just that it has very few bands. only 11 with a buffer of 1024 samples, 12 with 2048, 13 with 4096 and so on…

ah damn… otherwise the wavelet-output looks quite promising - just listened to some piano music and there, you can almost watch the piano-keys

…but of course the bands-problem is a pity

ooh exciting, gonna have a good look through that now

(disclaimer: i have not read deeply into it, so disregard what i say if i’m wrong, but) shouldn’t the number of bands also depend on the form of the wavelet? i.e. slimmer wavelet, more bands?