You are in CSLT open data repository.

Some typical databases we released include:

- Resource
  * Uyghur text: Uyghur text data for document classification, document summary, etc.
  * Cantonese lexicon: Canotnese lexicon collected from Adam Sheik's Cantonese Dict project.

- Audio
  * Disguise database: human's normal speech and disguised speech
  * Trivial events database: 7 types of human trivial events: cough, laugh, "wei", "hmm", "tsk-tsk", "ahem", sniff
  * SUD-12 database: short utterance database for speaker recognition
  * THUYG-20 database: Uyghur speech database for speech recognition
  * THUYG-20 SRE database: Uyghur speech database for speaker recognition
  * THUCH30 database: Chinese speech database for speech recognition
  * Kazak ASR database: Kazak speech database for speech recognition
  * Tibetan ASR database: Tibetan speech database for speech recognition
  * CSLT-Chronos database: a time-varying database for speaker recognition

....


More information can be found in our CSLT Free Data Repository.

For those project-specfici data, you may find them in CSLT Active Projects .