THCHS30 is an open Chinese speech database published by Center for Speech and Language Technology (CSLT) at Tsinghua University.
The origional recording was conducted in 2002 by Dong Wang, supervised by Prof. Xiaoyan Zhu, at the Key State Lab of Intelligence and System,
Department of Computer Science, Tsinghua Universeity, and the original name was 'TCMSD', standing for 'Tsinghua Continuous Mandarin
Speech Database'. The publication after 13 years has been initiated by Dr. Dong Wang
and was supported by Prof. Xiaoyan Zhu. We hope to provide a toy database for new researchers in the field of speech recognition. Therefore,
the database is totally free to academic users.
The entire package involves the full
set of speech and language resoruces required to establish a Chinese speech recognition system.
We call for competition on this database. Two challenges are set up,
and researchers are welcome to challenge the current state-of-the-art!
Check the challenge page for details.
There are two versions of THCHS-30, one is the Openslr version which is easily used by the Kaldi
toolkit, and the other is the Standalone version which contains the same content but in a slightly
different format. If you work on other toolkits (HTK, Sphinx etc), the standalone version is more appropriate.
The THUYG20 RECIPE IS NOW AVAILABLE:
All the resources contained in the database are free for research institutes and individuals. No commerical usage is permitted.
We are very happy if you cite the following paper in your publications:
Dong Wang, Xuewei Zhang, CSLT TRP 20150016: THCHS-30 : A Free Chinese Speech Corpus. [pdf][arXiv 1512.01882]
A paper (if it can be called a paper) 13 years ago regarding the database:
Dong Wang, Dalei Wu, Xiaoyan Zhu, "TCMSD: A new Chinese Continuous Speech Database",
International Conference on Chinese Computing (ICCC'01), 2001, Singapore. [pdf]
Dong Wang, Xuewei Zhang, Zhiyong Zhang @CSLT, Tsinghua Univ.
Dong Wang: firstname.lastname@example.org
XueWei Zhang: email@example.com
Zhiyong Zhang: firstname.lastname@example.org
CSLT, Tsinghua University
ROOM1-303, BLDG FIT