More about the FFT
Windowing
We used the specgram function that is built into Matlab to
produce spectrograms of the data. Specgram takes small segments
of a time sample, applies a window to each segment, and then takes the
FFT of that. The previous section mentioned how the FFT assumes
that the given sample repeats. The end of the original sample,
and the beginning of the first repetition are unlikely to line up
very well; in fact, there will most likely be a discontinuity on the
boundry. The
discontinuity causes problems because the FFT, wanting to model
the repeated signal exactly, adds a lot of high frequencies to
replicate the sharp drop from the end of the original sample
to the beginning of the first repetition. However, the
discontinuity is not really present in the original signal, so the
FFT provides an inaccurate representation of what's really going on.
In order to minimize this discontinuity, the time samples were
multiplied by a window. The Hanning window, which we used,
looks a lot like a Gaussian (bell curve). This window minimizes
the discontinuity because the start and end of the window are
very close to zero, forcing the end of one sample and the
beginning of the next to line up.
Are Short Time FFTs Reasonable?
We need to assure that using an FFT over a short
time gives us a reasonable approximation of the note. To do this,
we looked at the FFT taken over the entire note, and compared
it to the FFT taken over just a short time.
The above is the FFT of an F#4 played legato with vibrato.
The blue line is the FFT taken over the entire note (about
half a second), and
the red line is an FFT done with 4096 samples from the middle
of the note. Since our sampling frequency is 44.1 kHz, this
means the red line is the FFT taken over 0.093 sec. However,
it's hard to compare the two different lines because one
(the longer/blue) has many more data points than the other.
So what do we do? We normalize.
Above is the first normalization we tried,
which was normalizing the sum total
of each to one. This looks a little funny, because in almost
all parts, the red line is higher than the blue line. Even
in the sections where both are very small, the red line is still
higher than the blue line. The reason this FFT looks so odd is
that the long time sample has almost 5 times as
many data points as the short time sample.
The normalization shown above normalizes the area under each curve to
one. This normalization is done by dividing each magnitude by
the sum of all the magnitudes times the frequency resolution.
Conclusions
In order to draw meaningful conclusions from the data, we must use the
proper normalization. The method that makes the most sense
in our case is the second normalization, because it accounts
for the fact that the long and short time samples have a different
number of data points. Overall, the patterns of harmonics look
similar for both long and short time samples. From this observation,
we conclude that it is reasonable to use short time FFTs with our
specified parameters
for violin pitch detection.
BACK
NEXT