SUD12 is a speech database recorded by Center for Speech and Language Technology at Tsinghua University, with the puropse of short utterance speaker recognition. It is free for all research institutes and individuals.

The database involves 2 datasets:  the training set involves 61 speakers and each speaker produces 100 utterances. We provide two forms of the training data: Train-1 contains the original continuous data; Train-2 segment the speech signals into IFs and merge the same IF as a signle audio file. The test set invovles 61 speakers and each speaker produces 63 utterances. 

The recording environment is in silent office, with a sample rate of 16k Hz, and the precesion as 16 bits.

wav [download]
|Train-1 : training data in continuous speech
|Train-2 : training data segmented according to phones
|Test : test data
doc [download]
| : transcripts and papers

Improving Short Utterance Speaker Recognition by Modeling Speech Unit Classes


For any query, please contact with: 

Thomas Fang Zheng:
Dong  Wang       :


BLDG FIT, RM 1-303
Tsinghua University