Bio - A Bioacoustical Analyzer

Øyvind Hammer
University of Oslo

oyvindha@notam.uio.no

Oslo, February 17, 1997

Introduction

Bio is a simple program for sound analysis. Though aimed at bioacoustical applications, it should be useful in other fields as well (speech, music, vibration). The program displays a spectral analysis, amplitude, pitch, noise content and spectral centroid.

Installation

Bio will only run on SGI computers with IRIX 5.1 or higher. You should copy the bio executable to a directory in your path. The X resources file Bio must be copied to /usr/lib/X11/app-defaults.

How to use it

Import a sound using the Load & Analyse selection in the File menu. You should then get a sonogram, oscillogram, pitch track and noise curve. Using the buttons and sliders at the bottom of the screen, you can play the sound, move around and zoom in and out.

You mark an area with the right mouse button. When an area has been marked, only this section will be used when playing and exporting.

The Export options will write time/value pairs to an ASCII file for later use in other programs. The Matlab application and many others will be able to read these files directly.

To show the centroid curve, you must tick the appropriate box under the Settings->Display option.

What does it mean?

The sonogram

The sonogram is a representation of the energy at different frequencies as it varies over time. With some practice, you can read much information out of this. Time runs along the horizontal axis and frequency along the vertical. Energy content is visualized with a gray tone or color (as chosen in the Settings menu).

Pitched sounds like instrumental tones and vowels in speech will show up as a stack of horizontal bands, representing the partials (fundamental and overtones) of the sound. Whistling sounds with weak overtones may show only the fundamental as a single line. The fundamental frequency can be read off directly, and pitch variation phenomena like glissando and vibrato can be easily spotted. Noisy sounds will typically show as less coherent, cloudy masses. Resonances or formants (broad frequency areas with high energy content) can also be easily seen.

The oscillogram

The oscillogram shows the amplitude of the sound as it varies over time. Large amplitude will generally imply high loudness.

The pitch track

The pitch track is an attempt at tracking the fundamental frequency. This can be approximated by eye in the sonogram, but the pitch track gives a more exact representation. The pitch tracker is not perfect however. For some sounds, it may choose the wrong partial as the fundamental, which will typically show up as octave errors.

The pitch track has a logarithmic frequency scale, because this is how the ear perceives things. The vertical scale can be said to give note values (C, C#, D, E, E#, F etc.).

The noise curve

The noise curve gives an idea of the noise (or hiss) content in the sound. Pure tones should give low values, while highly complex or noisy sounds should give high values. Again, the values here may unfortunately not comply fully with what you hear.

The centroid

The centroid shows the frequency around which all the energy in the spectrum seems to be centered.

Technicalities

The sonogram

The sonogram is computed using a 1024 point FFT with window size 2048 and overlap factor 4. The program uses a Hamming window function. The colors are coded according to a logarithmic scale.

The pitch track

The pitch track is computed using the following algorithm. First, the differences in FFT phase values from window to window is converted to an approximation of the exact frequency of the assumed single partial contained in each bin (the so-called phase vocoder method).

The strongest partial is then selected as the fundamental. This may in fact represent the first or second overtone, however. The frequencies at 1/2 and 1/3 of this are therefore checked for high energy and can then be selected as the preferred fundamental.

"Unreasonable" pitch values are discarded in the graphical representation. This may imply silence, an un-pitched (noisy) sound or a highly complex (possibly polyphonic) sound.

The noise curve

The noise curve shows the RMS energy of the residual after a 10-pole linear predictive coding. The values are normalized with the RMS energy of the original signal.

The centroid

For each analysis window, the centroid c is computed from the N=512 amplitude/frequency pairs as follows: