Your search
Results 28 resources
-
The MSP-AVW is an audiovisual whisper corpus for audiovisual speech recognition purpose. The MSP-AVW corpus contains data from 20 female and 20 male speakers. For each subject, three sessions are recorded consisting of read sentences, isolated digits and spontaneous speech. The data is recorded under neutral and whisper conditions. The corpus was collected in a 13ft x 13ft ASHA certified...
-
This 3-year project investigates language change in five urban dialects of Northern England—Derby, Newcastle, York, Leeds and Manchester. Data collection method: Linguistic analysis of speech data (conversational, word list) from samples of different northern English urban communities. Data collection consisted of interviews, which included (1) some structured questions about the interviewee...
-
Ultrasound imaging has been widely adopted in speech research to visualize dynamic tongue movements during speech production. These images are universally used as visual feedback in interventions for articulation disorders or visual cues in speech recognition. Nevertheless, the availability of high-quality audio-ultrasound datasets remains scarce. The present study, therefore, aims to...
-
Abstract The use of real-time magnetic resonance imaging (rt-MRI) of speech is increasing in clinical practice and speech science research. Analysis of such images often requires segmentation of articulators and the vocal tract, and the community is turning to deep-learning-based methods to perform this segmentation. While there are publicly available rt-MRI datasets of speech,...
-
Abstract The study of articulatory gestures has a wide spectrum of applications, notably in speech production and recognition. Sets of phonemes, as well as their articulation, are language-specific; however, existing MRI databases mostly include English speakers. In our present work, we introduce a dataset acquired with MRI from 10 healthy native French speakers. A corpus...
-
This database contains two non-contemporaneous recordings of each of 68 female speakers of Standard Chinese (a.k.a. Mandarin and Putonghua). 60 of the speakers are from north eastern China, and 8 are from southern China. Each speaker was recorded in three speaking styles: - casual telephone conversation (cnv) - information exchange task over the telephone (fax) - pseudo-police-style interview (int)
-
Forensic database of voice recordings of 500+ Australian English speakers (AusEng 500+). This database contains 3899 recordings totalling 310 hours of speech from 555 Australian-English speakers. 324 female speakers: - 91 recorded in one recording session - 69 recorded in two separate recording sessions - 159 recorded in three recording sessions - 5 recorded in more than three recording...
-
The USC Speech and Vocal Tract Morphology MRI Database consists of real-time magnetic resonance images of dynamic vocal tract shaping during read and spontaneous speech with concurrently recorded denoised audio, and 3D volumetric MRI of vocal tract shapes during vowels and continuant consonants sustained for 7 seconds, from 17 speakers.
-
USC-EMO-MRI is an emotional speech production database which includes real-time magnetic resonance imaging data with synchronized speech audio from five male and five female actors, each producing a passage and a set of sentences in multiple repetitions, while enacting four different target emotions (neutral, happy, angry, sad). The database includes emotion quality evaluation from at least...
-
USC-TIMIT is a database of speech production data under ongoing development, which currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English, and electromagnetic articulography data from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus. In...
-
Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving...
-
CREMA-D is a data set of 7,442 original clips from 91 actors. These clips were from 48 male and 43 female actors between the ages of 20 and 74 coming from a variety of races and ethnicities (African America, Asian, Caucasian, Hispanic, and Unspecified). Actors spoke from a selection of 12 sentences. The sentences were presented using one of six different emotions (Anger, Disgust, Fear, Happy,...
-
The Voices Obscured in Complex Environmental Settings (VOiCES) corpus is a creative commons speech dataset targeting acoustically challenging and reverberant environments with robust labels and truth data for transcription, denoising, and speaker identification. This is one of the largest corpora to date that has transcriptions and simulatenously recorded real-world noise. The details: -...
Explore
Audio
-
Accent/Region
(3)
- Australian English (1)
- British English (2)
- Child Speech (6)
- Conversation (2)
- Emotional Speech (2)
- Forensic (2)
-
Language
(9)
- English (5)
- French (1)
- Language Learning (1)
- Mandarin (2)
- Multi-Speaker (5)
- Multi-Style (1)
- Pathological (7)
- Speech in Noise (1)
Derived & Measured Data
- Formant Measurements (4)
- Fundamental Frequency (1)
- Subglottal Tract (1)
- Vocal Tract (2)
- Vocal Tract Resonances (1)
- Voice Quality Measures (1)
Software, Processing & Utilities
Speech Production & Articulation
- Articulography (1)
- Brain Imaging (1)
- MRI (7)
- Ultrasound (6)
- Video (1)
- X-Ray (1)
Vocal Anatomy
- Mechanical Properties (1)
- Vocal Tract (6)
Tags
- female
- male (26)
- adult (20)
- audio data (18)
- read speech (10)
- English (9)
- speech-language pathology (7)
- MRI (6)
- child speech (6)
- ultrasound (5)
- transcribed (4)
- articulatory data (4)
- real-time MRI (rtMRI) (4)
- vowels (4)
- volumetric MRI (3)
- American English (3)
- speech production (3)
- rtMRI (3)
- formant measurement (3)
- speech sound disorder (3)
- angry (2)
- audiovisual (2)
- emotional speech (2)
- happy (2)
- older adult (2)
- sad (2)
- video (2)
- multimodal (2)
- vocal tract shape (2)
- conversation (2)
- interview (2)
- spontaneous speech (2)
- Mandarin (2)
- Newcastle (2)
- pathological speech (2)
- environmental noise (1)
- noisy audio (1)
- reverberation (1)
- disgust (1)
- articulation (1)
- electromagnetic articulography (EMA) (1)
- perceptually annotated (1)
- consonants (1)
- Australian (1)
- forensic (1)
- Putonghua (1)
- telephone (1)
- French (1)
- segmentation (1)
- British (1)
- Derby (1)
- English accents (1)
- Leeds (1)
- Manchester (1)
- York (1)
- digits (1)
- whisper (1)
- British English (1)
- phonetic labels (1)
- typically developing (1)
- x-ray (1)
- x-ray microbeam (1)
- antiresonance (1)
- impedance (1)
- vocal tract length (1)
- vocal tract resonance (1)
- subglottal tract (1)
- child (1)
- fundamental frequency (1)
- dysarthria (1)
- ultrasound tongue imaging (UTI) (1)
- stutter (1)
- cleft (1)
- liquids (1)
- vocal tract area function (1)
- CAPE-V (1)
- GRBAS (1)
- annotated (1)
- clinical (1)
- voice quality (1)
- brain activity (1)
- fMRI (1)
- vocal imitation (1)
Resource type
- Dataset (19)
- Journal Article (8)
- Web Page (1)