Your search
Results 78 resources
-
Multi-speaker TTS data for Javanese (jv-ID). This data set contains high-quality transcribed audio data for Javanese. The data set consists of wave files, and a TSV file. The file line_index.tsv contains a filename and the transcription of audio in the file. Each filename is prepended with a speaker identification number. The data set has been manually quality checked, but there might still...
-
Multi-speaker TTS data for Khmer (km-KH). This data set contains high-quality transcribed audio data for Khmer. The data set consists of wave files, and a TSV file. The file line_index.tsv contains a filename and the transcription of audio in the file. Each filename is prepended with a speaker identification number. The data set has been manually quality checked, but there might still be...
-
Multi-speaker TTS data for Nepali (ne-NP). This data set contains high-quality transcribed audio data for Nepali. The data set consists of wave files, and a TSV file. The file line_index.tsv contains a filename and the transcription of audio in the file. Each filename is prepended with a speaker identification number. The data set has been manually quality checked, but there might still be...
-
Multi-speaker TTS data for Sundanese (su-ID). This data set contains high-quality transcribed audio data for Sundanese. The data set consists of wave files, and a TSV file. The file line_index.tsv contains a filename and the transcription of audio in the file. Each filename is prepended with a speaker identification number. The data set has been manually quality checked, but there might...
-
Bengali ASR training data set containing ~196K utterances. This data set contains transcribed audio data for Bengali. The data set consists of wave files, and a TSV file. The file utt_spk_text.tsv contains a FileID, anonymized UserID and the transcription of audio in the file. The data set has been manually quality checked, but there might still be errors.
-
Javanese ASR training data set containing ~185K utterances. This data set contains transcribed audio data for Javanese. The data set consists of wave files, and a TSV file. The file utt_spk_text.tsv contains a FileID, UserID and the transcription of audio in the file. The data set has been manually quality checked, but there might still be errors. This dataset was collected by Google in...
-
Nepali ASR training data set containing ~157K utterances. This data set contains transcribed audio data for Nepali. The data set consists of wave files, and a TSV file. The file utt_spk_text.tsv contains a FileID, anonymized UserID and the transcription of audio in the file. The data set has been manually quality checked, but there might still be errors.
-
Sinhala ASR training data set containing ~185K utterances. This data set contains transcribed audio data for Sinhala. The data set consists of wave files, and a TSV file. The file utt_spk_text.tsv contains a FileID, anonymized UserID and the transcription of audio in the file. The data set has been manually quality checked, but there might still be errors.
-
Sundanese ASR training data set containing ~220K utterances. This data set contains transcribed audio data for Sundanese. The data set consists of wave files, and a TSV file. The file utt_spk_text.tsv contains a FileID, UserID and the transcription of audio in the file. The data set has been manually quality checked, but there might still be errors. This dataset was collected by Google in Indonesia.
-
Korean Open-source Speech Corpus for Speech Recognition by Zeroth Project. The data set contains transcriebed audio data for Korean. There are 51.6 hours transcribed Korean audio for training data (22,263 utterances, 105 people, 3000 sentences) and 1.2 hours transcribed Korean audio for testing data (457 utterances, 10 people). This corpus also contains pre-trained/designed language model,...
-
This data is transcribed speech data, in Amharic and Swahili and Wolof.
-
Aishell is an open-source Chinese Mandarin speech corpus published by Beijing Shell Shell Technology Co.,Ltd. 400 people from different accent areas in China are invited to participate in the recording, which is conducted in a quiet indoor environment using high fidelity microphone and downsampled to 16kHz. The manual transcription accuracy is above 95%, through professional speech annotation...
-
African Accented French Corpus This corpus consists of approximately 22 hours of speech recordings. Transcripts are provided for all the recordings. The corpus can be divided into 3 parts: 1. Yaounde Collected by a team from the U.S. Military Academy's Center for Technology Enhanced Language Learning (CTELL) in 2003 in Yaoundé, Cameroon. It has recordings from 84 speakers, 48 male and 36...
-
PhonemeDF is a large-scale phoneme-level parallel dataset of real and synthetic speech (approximately 730 hours), designed for audio deepfake detection and speech naturalness evaluation. The dataset consists of real speech samples derived from a subset of the LibriSpeech corpus (train-clean-100) and corresponding synthetic speech generated using four Text-to-Speech (TTS) systems (MeloTTS,...
-
All existing databases of spoofed speech contain attack data that is spoofed in its entirety. In practice, it is entirely plausible that successful attacks can be mounted with utterances that are only partially spoofed. By definition, partially-spoofed utterances contain a mix of both spoofed and bona fide segments, which will likely degrade the performance of countermeasures trained with...
Explore
Audio Data
-
Accent/Region
(2)
- British English (2)
- Accents (8)
- Child Speech (10)
- Conversation (15)
- Electroglottography / Electrolaryngography (1)
- Emotional Speech (3)
- Forensic (5)
-
Language
(16)
- African Languages (1)
- Bi-/Multilingual (1)
- English (9)
- French (1)
- German (1)
- Korean (1)
- L2+ (1)
- Language Learning (1)
- Mandarin (1)
- Multiple (2)
- Spanish (1)
- Pathological (8)
- Singing (2)
- Speech in Noise (5)
- Synthetic Speech (8)
Derived & Measured Data
- Vocal Tract (1)
Speech Production Data
- Articulography (3)
- Brain Imaging (1)
- EEG (1)
- MRI (9)
- Ultrasound (10)
- Video (3)
-
Vocal Anatomy
(10)
- Larynx and Glottis (1)
- Mandible and Maxilla (1)
- Vocal Tract (8)
Teaching Resources
Tags
- audio data
- transcribed (31)
- adult (24)
- English (23)
- male (22)
- read speech (21)
- female (18)
- spontaneous speech (15)
- child speech (11)
- conversation (11)
- synthetic speech (8)
- speech-language pathology (8)
- real-time MRI (rtMRI) (7)
- ultrasound (7)
- interview (7)
- magnetic resonance imaging (MRI) (7)
- deepfake (6)
- individual variability (6)
- video (5)
- articulatory data (5)
- older adult (4)
- Mandarin (4)
- forensic (4)
- telephone (4)
- volumetric MRI (4)
- British (3)
- perceptually annotated (3)
- American English (3)
- electromagnetic articulography (EMA) (3)
- speech production (3)
- ultrasound tongue imaging (UTI) (3)
- vowels (3)
- annotated (3)
- speech sound disorder (3)
- L2 English (3)
- text-to-speech (TTS) (3)
- French (3)
- speech in noise (3)
- open-source (2)
- English accents (2)
- singing (2)
- angry (2)
- audiovisual (2)
- emotional speech (2)
- happy (2)
- sad (2)
- Spanish (2)
- pathological speech (2)
- multi-language (2)
- held vowel (2)
- voice conversion (VC) (2)
- Sudanese (2)
- Nepali (2)
- Javanese (2)
- Bengali (2)
- British English (2)
- map task (2)
- articulation (2)
- multimodal (2)
- vocal tract shape (2)
- lip video (2)
- teaching resource (2)
- sociophonetic (2)
- Australian (2)
- logical access (1)
- physical access (1)
- spoof (1)
- speech recognition (1)
- rainbow passage (1)
- labelled (1)
- non-speech (1)
- disgust (1)
- podcast (1)
- bilingual (1)
- child-centered audio (1)
- mother-child interaction (1)
- dysarthria (1)
- digits (1)
- whisper (1)
- Amyotrophic Lateral Sclerosis (ALS) (1)
- Down syndrome (1)
- Parkinson's disease (1)
- cerebral palsy (1)
- stroke (1)
- stutter (1)
- longitudinal (1)
- typically developing (1)
- cleft (1)
- L2 speech (1)
- language learning (1)
- electroglottography (EGG) (1)
- intraoral pressure (1)
- validation (1)
- tenor (1)
- vibrato (1)
- Scottish English (1)
- coarticulation (1)
- within-speaker variability (1)
- professional voice (1)
- silent speech (1)
- 3D head meshes (1)
- German (1)
- acoustic pharyngometry (1)
- electroencephalography (EEG) (1)
- external craniofacial anthropometry (1)
- rhinometry (1)
- syllable sequences (1)
- partial spoof (1)
- phone-level alignment (1)
- Chinese (1)
- Amharic (1)
- Swahili (1)
- Wolof (1)
- Korean (1)
- Sinhala (1)
- Khmer (1)
- Afrikaans (1)
- Sesotho (1)
- Setswana (1)
- isiXhosa (1)
- Spanish accent (1)
- Czech (1)
- Japanese (1)
- Southern standard British English (SSBE) (1)
- Bradford (1)
- Kirklees (1)
- Wakefield (1)
- West Yorkshire (1)
- consonants (1)
- dentition (1)
- mandible (1)
- maxilla (1)
- International Phonetic Alphabet (IPA) (1)
- Arabic (1)
- accent variability (1)
- dialect variability (1)
- Putonghua (1)
- MRI (1)
- Derby (1)
- Leeds (1)
- Manchester (1)
- Newcastle (1)
- York (1)
- Ohio (1)
- phonetic labels (1)
- DICOM (1)
- brain activity (1)
- functional magnetic resonance imaging (fMRI) (1)
- vocal imitation (1)
- sociolinguistic (1)
- World Englishes (1)
- dyadic (1)
- African (1)
- Cameroon (1)
- Chad (1)
- Congo (1)
- Gabon (1)
- Niger (1)
Resource type
- Dataset (70)
- Journal Article (3)
- Report (1)
- Web Page (4)