Search
Full catalogue 113 resources
-
USC-TIMIT is a database of speech production data under ongoing development, which currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English, and electromagnetic articulography data from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus. In...
-
We have been collecting real-time MRI data from phoneticians producing the sounds of the International Phonetic Alphabet, together with standard sentences and texts. You may access the collected data by clicking on the pictures below.
-
Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving...
-
Praat script that automatically detects syllable nuclei in order to measure speech rate without the need of a transcription. Peaks in intensity (dB) that are preceded and followed by dips in intensity are considered as potential syllable nuclei. The script subsequently discards peaks that are not voiced.
-
SpeechBox is a set of multiple speech resources. Each will be added to the YorVoice Data Catalogue as individual, searchable resources soon.
-
These transcripts and video files are samples of Spanish and English caregiver (almost always mother)-child interaction collected at child ages 2 ½, 3, and 3 ½ years as part of a 10-year longitudinal study of the language and literacy development of U.S.-born children raised in Spanish-speaking homes. Each recording is approximately 30 minutes in length. The caregiver and target child are...
-
The MSP-Podcast corpus contains speech segments from podcast recordings which are perceptually annotated using crowdsourcing. The collection of this corpus is an ongoing process. Version 1.11 of the corpus has 151,654 speaking turns (237 hours and 56 mins). The proposed partition attempts to create speaker-independent datasets for Train, Development, Test1, Test2, and Test3 sets.
-
This dataset contains 350 parallel utterances spoken by 10 native Mandarin speakers, and 10 English speakers with 5 emotional states (neutral, happy, angry, sad and surprise). The transcripts are provided.
-
CREMA-D is a data set of 7,442 original clips from 91 actors. These clips were from 48 male and 43 female actors between the ages of 20 and 74 coming from a variety of races and ethnicities (African America, Asian, Caucasian, Hispanic, and Unspecified). Actors spoke from a selection of 12 sentences. The sentences were presented using one of six different emotions (Anger, Disgust, Fear, Happy,...
-
The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English, both spoken and written, from the late twentieth century. Access the data here: https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/2554
-
The Voices Obscured in Complex Environmental Settings (VOiCES) corpus is a creative commons speech dataset targeting acoustically challenging and reverberant environments with robust labels and truth data for transcription, denoising, and speaker identification. This is one of the largest corpora to date that has transcriptions and simulatenously recorded real-world noise. The details: -...
-
A sound vocabulary and dataset AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. The ontology is specified as a hierarchical graph of event categories, covering a wide range of human and animal sounds, musical instruments and genres, and common everyday environmental sounds. By...
-
Open SLR is a set of multiple speech resources. Each will be added to the YorVoice Data Catalogue as individual, searchable resources soon.
-
This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive.
-
VoxForge is an open speech dataset that was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac).
Explore
Audio
-
Accent/Region
(13)
- American English (2)
- Arabic (1)
- Australian English (2)
- British English (6)
- World Englishes (3)
- Child Speech (9)
- Conversation (9)
- Directed Speech (1)
- Electroglottography / Electrolaryngography (1)
- Emotional Speech (5)
- Forensic (5)
-
Language
(27)
- Arabic (1)
- Bi-/Multilingual (1)
- English (19)
- French (1)
- L2+ (1)
- Language Learning (2)
- Mandarin (3)
- Multiple (2)
- Multiple (2)
- Spanish (1)
- Multi-Speaker (18)
- Multi-Style (2)
- Pathological (9)
- Singing (2)
- Speech in Noise (3)
- Synthetic Speech (2)
Benchmarks & Validation
- Glottis (2)
Derived & Measured Data
- Formant Measurements (7)
- Fundamental Frequency (2)
- Phone-Level Alignments (1)
- Subglottal Tract (3)
- Vocal Tract (10)
- Vocal Tract Resonances (1)
- Voice Quality Measures (1)
Software, Processing & Utilities
- Articulatory Data Processing (2)
- Feature Extraction (4)
- Image and Volume Segmentation (3)
- Numerical Acoustic Modelling (3)
- Phone Apps (1)
- Speech Processing (5)
- Transcription (3)
- Utilities (4)
Speech Production & Articulation
- Articulography (2)
- Brain Imaging (1)
- MRI (11)
- Ultrasound (10)
- Video (3)
- X-Ray (1)
Teaching Resources
- 3D Models (2)
- Articulation Data (3)
- Tutorials (2)
- Videos (2)
Vocal Anatomy
- Hyoid (1)
- Larynx and Glottis (3)
- Mandible (2)
- Mechanical Properties (1)
- Vocal Tract (11)
Tags
- audio data (46)
- adult (40)
- male (33)
- female (28)
- read speech (23)
- English (23)
- transcribed (13)
- vowels (11)
- MRI (11)
- formant measurement (10)
- spontaneous speech (10)
- child speech (10)
- speech-language pathology (9)
- speech processing (7)
- video (7)
- ultrasound (7)
- teaching resource (6)
- interview (6)
- real-time MRI (rtMRI) (6)
- conversation (6)
- child (6)
- MATLAB (5)
- open-source (5)
- articulatory data (5)
- volumetric MRI (5)
- American English (5)
- vocal tract shape (5)
- segmentation (5)
- automatic speech recognition (ASR) (4)
- speech recognition (4)
- emotional speech (4)
- rtMRI (4)
- annotated (4)
- vocal tract area function (4)
- STL files (3)
- forensic (3)
- telephone (3)
- speaker diarization (3)
- audio processing (3)
- transcription (3)
- Python (3)
- English accents (3)
- British (3)
- angry (3)
- happy (3)
- older adult (3)
- sad (3)
- Mandarin (3)
- perceptually annotated (3)
- speech production (3)
- ultrasound tongue imaging (UTI) (3)
- Newcastle (3)
- DICOM (3)
- computed tomography (CT) (3)
- pathological speech (3)
- speech sound disorder (3)
- numerical acoustic modelling (3)
- source-filter model (2)
- tube model (2)
- Praat (2)
- phonetics (2)
- child-centered audio (2)
- audio (2)
- convert (2)
- file format (2)
- feature extraction (2)
- speech to text (2)
- speech activity detection (2)
- voice activity detection (2)
- whisper (2)
- synthetic speech (2)
- singing (2)
- audiovisual (2)
- articulation (2)
- multimodal (2)
- International Phonetic Alphabet (IPA) (2)
- electromagnetic articulography (EMA) (2)
- lip video (2)
- sociophonetic (2)
- Australian (2)
- phonetic labels (2)
- British English (2)
- L2 English (2)
- finite element method (FEM) (2)
- mandible (2)
- impedance (2)
- vocal tract length (2)
- subglottal tract (2)
- fundamental frequency (2)
- benchmark (2)
- glottis (2)
- videoendoscopy (2)
- multi-language (2)
- 3D print (1)
- Southern standard British English (SSBE) (1)
- map task (1)
- TextGrid (1)
- software (1)
- spectrogram (1)
- speech analysis (1)
- language development (1)
- language environment analysis (LENA) (1)
- word count estimation (1)
- record (1)
- stream (1)
- cepstral peak prominence (CPP) (1)
- harmonic-to-noise ratio (HNR) (1)
- C++ (1)
- classification (1)
- emotion recognition (1)
- speaker identification (1)
- conversational AI (1)
- overlapped speech detection (1)
- speaker embedding (1)
- anechoic (1)
- fast speech (1)
- high pitch (1)
- loud speech (1)
- low pitch (1)
- shout (1)
- slow speech (1)
- deepfake (1)
- logical access (1)
- physical access (1)
- spoof (1)
- speaker detection (1)
- two-class recognizer (1)
- rainbow passage (1)
- labelled (1)
- non-speech (1)
- environmental noise (1)
- noisy audio (1)
- reverberation (1)
- disgust (1)
- surprise (1)
- podcast (1)
- Spanish (1)
- bilingual (1)
- mother-child interaction (1)
- speech rate (1)
- syllable (1)
- syllable nuclei (1)
- consonants (1)
- jaw scans (1)
- accent map (1)
- speech synthesis (1)
- Arabic (1)
- accent variability (1)
- dialect variability (1)
- arousal (1)
- dominance (1)
- valence (1)
- Putonghua (1)
- image processing (1)
- French (1)
- Derby (1)
- Leeds (1)
- Manchester (1)
- York (1)
- digits (1)
- Ohio (1)
- Non-native speech (1)
- adaptation (1)
- diapix (1)
- Middlesbrough (1)
- Sunderland (1)
- speech acoustics (1)
- longitudinal (1)
- formant tracking (1)
- anatomy (1)
- app (1)
- larynx (1)
- typically developing (1)
- x-ray (1)
- x-ray microbeam (1)
- L2 speech (1)
- language learning (1)
- electroglottography (EGG) (1)
- intraoral pressure (1)
- validation (1)
- hyoid (1)
- antiresonance (1)
- vocal tract resonance (1)
- resonance (1)
- corner vowels (1)
- developmental trajectory (1)
- sexual dimorphism (1)
- loudness (1)
- subglottal pressure (1)
- back placement (1)
- chest resonance (1)
- classical (1)
- front placement (1)
- head resonance (1)
- open throat (1)
- roughness (1)
- tenor (1)
- vibrato (1)
- dysarthria (1)
- Amyotrophic Lateral Sclerosis (ALS) (1)
- Down syndrome (1)
- Parkinson's disease (1)
- cerebral palsy (1)
- stroke (1)
- stutter (1)
- cleft (1)
- liquids (1)
- nasals (1)
- plosives (1)
- morphometric (1)
- Lombard speech (1)
- clear speech (1)
- computer-directed speech (1)
- infant-directed speech (1)
- non-native-directed speech (1)
- speech in noise (1)
- Scottish English (1)
- coarticulation (1)
- within-speaker variability (1)
- phone duration (1)
- phone-level alignment (1)
- pitch (1)
- CAPE-V (1)
- GRBAS (1)
- clinical (1)
- voice quality (1)
- area function (1)
- vocal fold model (1)
- vocal tract transfer function (1)
- held vowel (1)
- brain activity (1)
- fMRI (1)
- vocal imitation (1)
- professional voice (1)
- silent speech (1)
- sociolinguistic (1)
- World Englishes (1)
- dyadic (1)
Resource type
- Conference Paper (1)
- Dataset (54)
- Journal Article (21)
- Preprint (2)
- Report (1)
- Software (19)
- Web Page (15)