Your search
Results 16 resources
-
Korean Open-source Speech Corpus for Speech Recognition by Zeroth Project. The data set contains transcriebed audio data for Korean. There are 51.6 hours transcribed Korean audio for training data (22,263 utterances, 105 people, 3000 sentences) and 1.2 hours transcribed Korean audio for testing data (457 utterances, 10 people). This corpus also contains pre-trained/designed language model,...
-
This data is transcribed speech data, in Amharic and Swahili and Wolof.
-
For over half a century, the UCLA Phonetics Laboratory has collected recordings of hundreds of languages from around the world, providing source materials for phonetic and phonological research, of value to scholars, speakers of the languages, and language learners alike. The materials on this site comprise audio recordings illustrating phonetic structures from over 200 languages with phonetic...
-
This dataset contains simultaneous recordings of electroglottography (EGG recorded with Glottal Enterprises EG2-PCX2), unfiltered audio, and intraoral pressure (recorded with Glottal Enterprises PG-60) from 14 subjects. It is meant to facilitate the validation of physical models of glottal control during voicing, in which the glottal/source waveform for speech is controlled by a combination of...
-
We introduce the Speak & Improve Corpus 2025, a dataset of L2 learner English data with holistic scores and language error annotation, collected from open (spontaneous) speaking tests on the Speak & Improve learning platform. The aim of the corpus release is to address a major challenge to developing L2 spoken language processing systems, the lack of publicly available data with high-quality...
-
The MSP-AVW is an audiovisual whisper corpus for audiovisual speech recognition purpose. The MSP-AVW corpus contains data from 20 female and 20 male speakers. For each subject, three sessions are recorded consisting of read sentences, isolated digits and spontaneous speech. The data is recorded under neutral and whisper conditions. The corpus was collected in a 13ft x 13ft ASHA certified...
-
Ultrasound imaging has been widely adopted in speech research to visualize dynamic tongue movements during speech production. These images are universally used as visual feedback in interventions for articulation disorders or visual cues in speech recognition. Nevertheless, the availability of high-quality audio-ultrasound datasets remains scarce. The present study, therefore, aims to...
-
Abstract The study of articulatory gestures has a wide spectrum of applications, notably in speech production and recognition. Sets of phonemes, as well as their articulation, are language-specific; however, existing MRI databases mostly include English speakers. In our present work, we introduce a dataset acquired with MRI from 10 healthy native French speakers. A corpus...
-
The MSP-Conversation corpus contains interactions annotated with time-continuous emotional traces for arousal (calm to active), valence (negative to positive), and dominance (weak to strong). Time-continuous annotations offer the flexibility to explore emotional displays at different temporal resolutions while leveraging contextual information. Release 1.0 contains 74 conversations with...
-
These transcripts and video files are samples of Spanish and English caregiver (almost always mother)-child interaction collected at child ages 2 ½, 3, and 3 ½ years as part of a 10-year longitudinal study of the language and literacy development of U.S.-born children raised in Spanish-speaking homes. Each recording is approximately 30 minutes in length. The caregiver and target child are...
-
The MSP-Podcast corpus contains speech segments from podcast recordings which are perceptually annotated using crowdsourcing. The collection of this corpus is an ongoing process. Version 1.11 of the corpus has 151,654 speaking turns (237 hours and 56 mins). The proposed partition attempts to create speaker-independent datasets for Train, Development, Test1, Test2, and Test3 sets.
-
The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English, both spoken and written, from the late twentieth century. Access the data here: https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/2554
-
A sound vocabulary and dataset AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. The ontology is specified as a hierarchical graph of event categories, covering a wide range of human and animal sounds, musical instruments and genres, and common everyday environmental sounds. By...
-
This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive.
Explore
Audio Data
-
Language
- African Languages (1)
- Bi-/Multilingual (1)
- English (9)
- French (1)
- German (1)
- Korean (1)
- L2+ (1)
- Language Learning (1)
- Mandarin (1)
- Multiple (2)
- Spanish (1)
-
Accent/Region
(2)
- British English (2)
- Child Speech (1)
- Conversation (2)
- Electroglottography / Electrolaryngography (1)
- Emotional Speech (1)
- Pathological (1)
- Singing (1)
Speech Production Data
- Articulography (1)
- EEG (1)
- MRI (1)
- Ultrasound (1)
- Video (1)
-
Vocal Anatomy
(3)
- Larynx and Glottis (1)
- Vocal Tract (1)
Tags
- audio data
- English (7)
- read speech (7)
- adult (6)
- transcribed (5)
- female (3)
- male (3)
- perceptually annotated (2)
- spontaneous speech (2)
- open-source (1)
- English accents (1)
- rainbow passage (1)
- labelled (1)
- non-speech (1)
- singing (1)
- British (1)
- podcast (1)
- Spanish (1)
- bilingual (1)
- child speech (1)
- child-centered audio (1)
- mother-child interaction (1)
- conversation (1)
- Mandarin (1)
- dysarthria (1)
- pathological speech (1)
- speech-language pathology (1)
- ultrasound tongue imaging (UTI) (1)
- vowels (1)
- audiovisual (1)
- digits (1)
- video (1)
- whisper (1)
- L2 English (1)
- L2 speech (1)
- annotated (1)
- interview (1)
- language learning (1)
- electroglottography (EGG) (1)
- intraoral pressure (1)
- validation (1)
- multi-language (1)
- 3D head meshes (1)
- German (1)
- acoustic pharyngometry (1)
- electroencephalography (EEG) (1)
- electromagnetic articulography (EMA) (1)
- external craniofacial anthropometry (1)
- held vowel (1)
- rhinometry (1)
- syllable sequences (1)
- Amharic (1)
- Swahili (1)
- Wolof (1)
- Korean (1)
- French (1)
- MRI (1)
- real-time MRI (rtMRI) (1)
- volumetric MRI (1)
Resource type
- Dataset (9)
- Journal Article (3)
- Report (1)
- Web Page (3)