Musical Event Recognition

Click on image to enlarge.

In recent years voice recognition technology has advanced to the point where it is commonly deployed as a method of human to machine communication in many day to day tasks. Similar achievements in music recognition however have yet to materialize. Except for the simplest music, machines still have difficulty recognizing what humans would consider to be even basic musical features. For example machines are often not able to reliably identify which notes are being played where multiple instruments are simultaneously sounding. In fact even recognizing the types of instruments present remains a challenge.

Meanwhile applications for music recognition abound. The classic canonical task for music recognition is automatic transcription. For this application a program is given a music recording as input and produces a score of the music as output. While this grand goal may never be achieved, many related goals are more within reach. For example data and text mining techniques are becoming a common and recognized method of discovery in humanities research. To date however musicology has largely only been able to automate pattern recognition in scores.Similar machine based analysis of musical performance has been limited by the inability of machines to recognize fine musical structure. Likewise musical instrument development is beginning to evolve toward instruments which aware of their sonic environment. In the simplest case this might involve automated instruments which are synchronized to scores which in turn are being performed by human players. The task of the instrument is then to recognize where the players are in the score as the performers bumble their way through it with missed notes, wrong notes, and audience coughing.

CREL researcher, Kevin Larke, is involved in an ongoing research in this area through the development of tools which address a variety of practical and experimental goals. The ft program is designed to perform acoustic feature extraction from audio files. The fc program is an audio segmentation program based on sequential timbral clustering. This program uses the data generated by ft to find time boundaries which characterize timbrally dissimilar sections. Both of these programs are based on the cm library. cm is a C language library which implements an application programming interface (API) for development of audio signal processing and recognition applications.