Pitchtracker

Pitchtracker is a routine for tracking the fundamental pitch trajectory of a sound. It is an experimental routine that works, I believe, but forever has its quirks. Three detection methods are available for following the 1) fundamental of the harmonic collection, 2) the strongest formant, or 3) the band-limited centroid. Different output formats let you see, hear and eventually use the fruits of your pitch tracking.

Back to Main


Amplitude Envelope Warp
Analysis Frames per Second
Begin Time in Seconds
Compression Threshold in dB
Data Type
Decibels of Compression
Detection Method
Detection Threshold in dB
Detection Window Size Maximum in Seconds
Detection Window Size Minimum in Seconds
End Time in Seconds
Envelope Attack
Envelope Release
FFT Length
Frequency Shift Factor
Gate Threshold in dB
High Frequency/Pitch Boundary
Low Frequency/Pitch Boundary
Multiple Channel Method
Output Data Format
Output Data Type
Output Samples per Second
Pitch Trajectory Smoothing: Frequency Response Time
Print Elapsed Time
Reference Frequency
Resynthesis Channel
Window Size in Samples

 

Amplitude Envelope Warp

Many of the routines employ the principle of warping in which a distribution of values is transformed by an identity function. In these places an exponential function is employed to remap a 0-1 range of values into a new orientation that preserves the minima (0) and maxima (1) while bringing the distribution closer to either extreme as a result of the curvature of the exponential function selected. The curvature of the exponential function is selected through a warp index. Specifically, warp index w will reorient the input x through the function below (^ = exponentiation).

y = (1. - (e^(x * w))) / (1. - (e^w))

In this function, the warp index of 0 produces a linear function and an untransformed output. Positive warp index values of increasing magnitude produce curves of increasing concavity (increasing slope) that draw values towards the 0-valued minima, and reduce the function integral. Negative values do the opposite, drawing values towards the maxima of 1, increasing the integral.

The practical use of this mechanism is found in various places. One such place is the reshaping of the frequency response distribution characteristics. In this, positive warp indeces cause the peaks of the response to be accentuated while the weaker frequencies are expanded out (i.e. pushed towards 0). Negative values have the opposite effect as they compress the dynamic range of the response and raise the relative level of the weaker noise components. Another place where warp applies is in the remapping of FFT amplitudes through the spectrum warpshape. In this, the sucessive FFT frames have their amplitudes remapped by the identity function, similiarly expanding or compressing the dynamic range depending upon the warp specified; 0 (linear warp function) leaves the amplitudes unchanged.


Analysis Frames per Second

This controls how often the phase vocoder will perform an analysis on the signal. It is a translation of the classic decimation control that specifies how many samples to skip between analysis frames. More frames increases the resolution of time but decrease speed. 200 frames per second is a good reference point. If you expand time you should increase this proportionately to maintain about 200 or more frames per second.


Begin Time in Seconds

The time, in seconds, at which to begin processing the soundfile.


Compression Threshold in Decibels

Determines the threshold for compression. Any frequency louder than this parameter will be compressed.


Data Type

Determines how the instrument will read the values in the fields which set the upper and lower detection boundaries. 0 means the values will be read as frequencies, 1 as being in the form octave.pitchclass.


Decibels of Compression

Determines how much to reduce frequencies louder than the threshold by.


Detection Method

The method used to detect pitches. 0 uses the strongest harmonic collection of formants, which finds up to 12 of the strongest formants and determines the fundamental based on the reinforcement of its harmonic spectrum by other formants. 1 uses the strongest formant between the high/low boundaries. 2 uses the band-limited centroid, which is the amplitude squared weighted by the spectrum between the high/low boundaries.


Detection Threshold in dB


Detection Window Size Maximum in Seconds

No part of the soundfile after this time index will be used for pitch tracking.


Detection Window Size Minimum in Seconds

No part of the soundfile before this time index will be used for pitch tracking.


End Time in Seconds

The time, in seconds, at which to stop processing the soundfile. 0 or less is equivalent to the duration of the soundfile.


Envelope Modifications

The rate at which amplitude changes are allowed to occur effects how smooth spectral evolutions will be. To control this, many routines contain attack and decay response times controls: once translated these controls manipulate the coefficients of the following filter.

y(n) = (1. - A) * x(n) + A * y(n)

The filter is a lowpass designed to increasingly smooth the sudden changes in a signal as the value of the coefficient, A, is increased. Its control is through the response time parameter which is the time in seconds it takes a signal, shifting from one state to another, to decay to -60 dB of its former state. Response times are transformed to create the necessary coefficients for the selected frame rate. The response time is separated into attack and decay; this allows seperate control of the smoothing of the signal depending upon whether it is increasing or decreasing in amplitude. Short attack/decay response times can be used in places where dynamic processing induces garble or even pops. You can use longer response times to generally smooth or blur the onset/offset of sound components, particularly if the response controls are being applied to a time-varying filter. When applied to amplitudes, longer decay respsonse-times do not sound good, for in their delay of the decay, they end up amplifying te residual noise of a sound.

Envelope Attack Time in Seconds

Envelope attack time affects the speed at which the amplitude of a sound changes. Large values blur the sound's attack, smaller values sharpen it.

Envelope Release Time in Seconds

Envelope release time affects the speed at which the amplitude of a sound changes. Large values cause the sound to fade for a longer period, smaller values cause the sound to cut off more suddenly.


FFT Length

The FFT size must be a power of 2. Larger FFT sizes resolve frequencies better but transient behavior more poorly. Choose your FFT size according to the sound you are working with. A size of 1024 or 2048 works well in most cases.


Gate Threshold in Decibels


High Frequency/Pitch Boundary

The upper boundary used when analyzed the input soundfile. Frequencies/pitches above this will be ignored.


Low Frequency/Pitch Boundary

The lower boundary used when analyzed the input soundfile. Frequencies/pitches above this will be ignored.


Multiple Channel Method

Determines whether the output file will contain data on the peak or average amplitude for each frame. 0 indicates peak amplitude, 1 indicates average.


Output Data Format

Determines what kind of data is saved. 0 outputs frequency data, 1 outputs octave.decimal code, 2 outputs semitones of deviation from the reference pitch, and 3 outputs negative semitones of deviation from the reference pitch.


Output Data Type

Determines how the data will be saved. 0 indicates an ASCII file, 1 indicates 32-bit floats.


Output Samples per Second


Pitch Trajectory Smoothing: Frequency Response Time


Print Elapsed Time

Prints out the time index where the process currently is in the soundfile while it is being analyzed.\n\n0 turns this off, 1 turns it on.


Reference Frequency

Used when output data format is set to 2 or 3. See output data format for more information.


Resynthesis Channel

All routines allow both monophonic and multi-channel input files to be processed. With multi-channelled files, you can either select one channel and produce a monophonic output file, or process all the channels. Channels are numbered beginning with 1. Processing of multi-channelled files is done one channel at a time beginning with channel 1, with zeros written to channels which have yet to be processed. Processing one channel at a time requires less memory and allows you to audition the output sooner than if you did all channels at once.

Use 0 to process all channels.


Window Size in Samples

The window size is a less opaque parameter; like the FFT, it must be a power of 2. Windows twice the size of the FFT work well. Larger window sizes may resolve frequencies better. Specifying 0 for the window size will automatically set the window to twice the FFT size.