CREL Director, Shlomo Dubnov, Visiting Researcher at UCSD's Computer Audition Lab, Liliya Tsirulnik, Professor of Music at UCSD, Susan Narucki, and Professor of Music at UCSD, Philip Larson, have teamed up to create the Singing Voice Database (SVDB) that may be used for research in singing voice expressions and in development of a Singing Voice Synthesis System.

The Singing Voice Database (SVDB) consists of plain recordings and recordings of special singing expressions, which let to use SVDB for research of physical characteristics of different singing expressions. The SVDB includes the singing musical scale recordings, which make it possible to use it for Singing Voice Synthesis System.

Database description and structure

The SVDB consists of two major parts:

  1. The singing musical scale recordings ("Ah-ah-ah" and "La-la-la" versions).
  2. The full song singing recordings with different expressions.

Every recording includes two waveform files, which contain the recording of voice of the singer ("vocal recordings") and the recording of sound near singer glottis ("glottal recordings"). Both parts (the scale and the song) contain the studio recordings of female and male professional singers. The first part includes recordings of Bonnie Lander and Philip Larson. The second part includes recordings of Susan Narucki and Philip Larson.

We assume that the first part will be useful for developing a Singing Voice Synthesis system, while the second part will be used mostly for research. Both parts contains plain recordings and recordings with special singing expressions described in the Table 1 below.

Expression # Expression Name Description
1 Bounce Increased articulation on consonants (slight increase of weight on initial consonants), followed by decrease of weight on adjacent vowels. More rhythmic vitality of a regular sort.
2 Hollow Less articulation of consonants. Modification of vowels to minimize their differences with the addition of "air" in the tone (as opposed to focused tone).
3 Light Minimal initial articulation and weight. Modification of vowels to emphasize "brightness" upper partials.
4 Soft Modification of vowels; some air added, low volume. Consonants are present, but not sharply defined.
5 Sweet Extreme legato. Pure vowels. Consonants present, but without extra articulated weight.
6 Flat Affectless. Consonants and vowels with same weight. Minimizing melodic contour.
7 Mature Emphasis on heavier vibrato in sound, (irregular) emphasis on lower partials of vowels (dark rather than bright).
8 Sharp Emphasis on forward placement of vowel, cutting off lower partials. Aggressive articulation of consonants.
9 Husky Irregular rhythmic inflection in phrasing. Irregular pronunciation of consonants and vowels, additional throat grab noises and air to vowel mix.
10 Clear Purity of vowels and consonants. Emphasis on regularity of pronunciation. Sincere affect.

Musical Scale Recordings

This part of SVDB contains the following recordings groups:

  • The scale (musical notes) performed using "ah" vowel ("ah-ah-ah" recordings)
  • The scale (musical notes) performed using "la" syllable ("la-la-la" recordings)
  • The transitions between notes performed using "ah" vowel
  • The transitions between notes performed using "la" syllable
  • The scale performed with both "ah" and "la" with singing expressions, listed in the Table.

Every recording performed with the following musical notes range:

  • Female voice (Bonnie Lander): from A3 to A5.
  • Male voice (Philip Larson): from A2 to E4.

Vocal and glottal recordings are provided. For every waveform file the voiced and unvoiced segments are marked in separate annotation file, which has the same name as the waveform file and extension "lab".

Directory and File Structure

Files of the first part are contained in "Scale recordings" directory. The recordings are in WAVE PCM format with the following characteristics: 44100 Hz; 16 bit; 1 channel (mono). There is an annotation file (.lab) associated with each wav-file. The .lab-files have the same format, as the TIMIT database files, namely:



START_POINT :== The starting point of the segment, *108 ms

END_POINT :== The end point of the segment, *108 ms

SEGMENT_NAME :== The name of the specified segment, v for voiced segment and u for unvoiced segment.

.wav and .lab files for each recordings have the same name.