Onset components and noise components are first extracted from the frequency spectrum calculated by the Fast Fourier Transform. Onset-time finders then detect onset times in different frequency ranges and with different sensitivity levels. At the same time, another process, a drum-sound finder, detects BD and SD.
Frequency components whose power has been rapidly increasing are extracted as onset components. The onset components and their degree of onset (rapidity of increase in power) are obtained by a process that takes into account the power present in nearby time-frequency regions.
BTS extracts noise components as a preliminary step to detecting SD. Because non-noise sounds typically have harmonic structures and peak components along the frequency axis, frequency components whose power is roughly uniform locally are extracted and considered to be potential SD sounds.
Fourteen onset-time finders use different sets of frequency-analysis parameters. Each finder sends its onset information to a particular agent-pair. Each onset time is given by the peak time found by peak-picking in D(t) along the time axis, where , and d(t,f) is the degree of onset of frequency f at time t. The sum D(t) is linearly smoothed with a convolution kernel before its peak time is calculated.
A drum-sound finder detects BD from the onset components and SD from the noise components. Note that BTS cannot simply use the detected drums to track beats, because the results of this detection include many mistakes. The detected drums are used only to label a beat time with its beat type.
[Detecting onset times of BD]
Because the sound of BD is not known in advance,
BTS learns the characteristic frequency of BD
corresponding to a particular song.
The finder finds peaks in the onset components along the frequency axis and
histograms them
(Figure 2).
The finder then judges that BD has sounded at
times when an onset's peak
frequency coincides with the characteristic frequency that is given by the lowest-frequency peak of the histogram.
[Detecting onset times of SD]
BTS detects noise components widely distributed along the frequency axis as SD.
First,
the noise components are quantized
(Figure 2).
Second,
the finder calculates
how widely noise components are distributed along the frequency axis
in the quantized noise components
(degree of wide distribution c(t)).
Finally,
the onset time of SD is obtained
by peak-picking of c(t) in the same way as in the onset-time finder.
Masataka Goto