Large Sundanese ASR training data set

Resource type

Title

Abstract

Sundanese ASR training data set containing ~220K utterances. This data set contains transcribed audio data for Sundanese. The data set consists of wave files, and a TSV file. The file utt_spk_text.tsv contains a FileID, UserID and the transcription of audio in the file. The data set has been manually quality checked, but there might still be errors. This dataset was collected by Google in Indonesia.

Citation Key

_an

URL

https://openslr.org/36/

Citation

Large Sundanese ASR training data set. (n.d.). [Dataset]. Retrieved https://openslr.org/36/