FFT node - observations

I’ve been comparing the output of the VVVV FFT node with my C audio processing programme which uses the FFTW library.

The VVVV documentation doesn’t really spell out precisely what the FFT calculates. Here is what I think it does based on observations. Would one of the VVVV source crew like to confirm:

  • As you would expect the output is a function of the log of the power density. * EDIT: As clarified below this isn’t right. It is just the power density, not the log *
  • There is an (adaptive?) noise filter that rescales the output to cut off a large amount of low-power noise that would otherwise appear.
  • The most odd feature is that there appears to be an EQ filter applied which increases the response with higher frequency. If you feed the FFT with white noise you get a slopeing output.

please upload a picture of the white noise slope, probably the fft output is not normalized… for your other questions has one to look into the code. have you switched off the windowing function?

Attached is the test patch and the function generator I am using.

Select the “stereo mixer” as your recording input and then run the function generator. Set the left amplitude to the mid point and click on “white noise”.

I get a definate bulge on the right hand side of the FFT graph from the VVVV patch. On the FFT I have in C the graph is flat.

FFT.zip (27.1 kB)

I get the same bulge.
What I want too know is how to read lower frequencies.
The fft seems happy to pick up high frequencies, but I want to work with a human singing voice. How can I get the fft to do some pitch recognition for this ?

Thanks

The FFT does respond to lower frequencies. If my measurements are right it is just attenuated with respect to the higher frequencies. I guess you could simply apply a scaling factor after the FFT to try and correct this.

The FFT output is (normally) bins of equal frequency size. The maximum frequency is half the sampling rate used to capture the signal. From this you should be able to work out which bins contain voice frequencies. Depending on what you want to do you will then need to apply some processing algorithm. Perhaps look for the bin with the most power as a first approximation.

confirmed, the output gets scaled. i’v now added an input pin to endable/disable the scaling. its default value is enabled, to keep backward compatibility. any better idea?

the fft frequency bins have a linear distribution. to get a better low frequency analysis you have to increase the spreadcount and the audio input buffer size.

@tonfilm: cool. i think the additional pin is the best solution.

I would go along with the new pin solution as well.

Can you also comment on how you are doing a lower-threshold on the FFT output to stop it displaying very-low-power noise?

before the fft is a sine window function, after the fft a scaling factor, which is 1/(sliceNr+1). the fft in earlier versions had 0 for the first slice.

Thanks tonfilm. I think that aspect is now clear.

Just one more question (well maybe a supplementary as well):
Is the FFT output a power spectrum which is proportional to the audio power in each bin, or is it a log(power) spectrum which is proportional to the log of the power in each bin?

In other words when you get the FFT output do you take a log before sending it to the FFT patch output?

If the output is log(power) what have you taken as the zero point and is it adaptive to the input or just fixed?

the output is the magnitude of the complex FFT output. there is no log function.

Thanks. That clears everything up for me!