You are in CSLT open data repository.
Some typical databases we released include:
- Resource
* Uyghur text: Uyghur text data for document classification, document summary, etc.
* Cantonese lexicon: Canotnese lexicon collected from Adam Sheik's Cantonese Dict project.
- Audio
* Disguise database: human's normal speech and disguised speech
* Trivial events database: 7 types of human trivial events: cough, laugh, "wei", "hmm", "tsk-tsk", "ahem", sniff
* SUD-12 database: short utterance database for speaker recognition
* THUYG-20 database: Uyghur speech database for speech recognition
* THUYG-20 SRE database: Uyghur speech database for speaker recognition
* THUCH30 database: Chinese speech database for speech recognition
* Kazak ASR database: Kazak speech database for speech recognition
* Tibetan ASR database: Tibetan speech database for speech recognition
* CSLT-Chronos database: a time-varying database for speaker recognition
* CSLT-ESDB database: speech emotion database for emotion recognition
....
More information can be found in our CSLT Free Data Repository.
For those project-specfici data, you may find them in CSLT Active Projects .