Your search
Results 78 resources
-
Welcome to our interactive International Phonetic Association (IPA) chart website! Clicking on the IPA symbols on our charts will allow you to listen to their sounds and see vocal-organ movements imaged with ultrasound, MRI, or in animated form. To find out more about how our IPA charts were made, click on the buttons on the left-hand side of this page. The website contains two main...
-
Dynamic Dialects contains an articulatory video-based corpus of speech samples from world-wide accents of English. Videos in this corpus contain synchronised audio, ultrasound-tongue-imaging video and video of the moving lips. We are continuing to augment the database. The website contains three main resources: - A clickable Accent Map: clicking on points of the map will open up links to...
-
This is a corpus of articulatory data of different forms (EMA, MRI, video, 3D scans of upper/lower jaw, audio etc.) acquired from one male British English speaker.
-
The USC Speech and Vocal Tract Morphology MRI Database consists of real-time magnetic resonance images of dynamic vocal tract shaping during read and spontaneous speech with concurrently recorded denoised audio, and 3D volumetric MRI of vocal tract shapes during vowels and continuant consonants sustained for 7 seconds, from 17 speakers.
-
USC-EMO-MRI is an emotional speech production database which includes real-time magnetic resonance imaging data with synchronized speech audio from five male and five female actors, each producing a passage and a set of sentences in multiple repetitions, while enacting four different target emotions (neutral, happy, angry, sad). The database includes emotion quality evaluation from at least...
-
USC-TIMIT is a database of speech production data under ongoing development, which currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English, and electromagnetic articulography data from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus. In...
-
Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving...
-
These transcripts and video files are samples of Spanish and English caregiver (almost always mother)-child interaction collected at child ages 2 ½, 3, and 3 ½ years as part of a 10-year longitudinal study of the language and literacy development of U.S.-born children raised in Spanish-speaking homes. Each recording is approximately 30 minutes in length. The caregiver and target child are...
-
The MSP-Podcast corpus contains speech segments from podcast recordings which are perceptually annotated using crowdsourcing. The collection of this corpus is an ongoing process. Version 1.11 of the corpus has 151,654 speaking turns (237 hours and 56 mins). The proposed partition attempts to create speaker-independent datasets for Train, Development, Test1, Test2, and Test3 sets.
-
CREMA-D is a data set of 7,442 original clips from 91 actors. These clips were from 48 male and 43 female actors between the ages of 20 and 74 coming from a variety of races and ethnicities (African America, Asian, Caucasian, Hispanic, and Unspecified). Actors spoke from a selection of 12 sentences. The sentences were presented using one of six different emotions (Anger, Disgust, Fear, Happy,...
-
The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English, both spoken and written, from the late twentieth century. Access the data here: https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/2554
-
A sound vocabulary and dataset AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. The ontology is specified as a hierarchical graph of event categories, covering a wide range of human and animal sounds, musical instruments and genres, and common everyday environmental sounds. By...
-
This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive.
-
VoxForge is an open speech dataset that was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac).
-
Common Voice is a project to help make voice recognition open to everyone. Developers need an enormous amount of voice data to build voice recognition technologies, and currently most of that data is expensive and proprietary. We want to make voice data freely and publicly available, and make sure the data represents the diversity of real people. Together we can make voice recognition better for everyone.
Explore
Audio Data
-
Accent/Region
(9)
- African French (1)
- American English (2)
- Arabic (1)
- British English (4)
- World Englishes (2)
- Child Speech (10)
- Conversation (15)
- Electroglottography / Electrolaryngography (1)
- Emotional Speech (3)
- Forensic (5)
-
Language
(22)
- African Languages (1)
- Arabic (1)
- Bi-/Multilingual (1)
- English (12)
- French (2)
- German (1)
- Korean (1)
- L2+ (1)
- Language Learning (2)
- Mandarin (1)
- Multiple (2)
- Spanish (1)
- Pathological (8)
- Singing (2)
- Speech in Noise (5)
- Synthetic Speech (8)
Derived & Measured Data
- Vocal Tract (1)
Speech Production Data
- Articulography (3)
- Brain Imaging (1)
- EEG (1)
- MRI (9)
- Ultrasound (10)
- Video (3)
-
Vocal Anatomy
(10)
- Larynx and Glottis (1)
- Mandible and Maxilla (1)
- Vocal Tract (8)
Teaching Resources
Tags
- audio data
- transcribed (31)
- adult (24)
- English (23)
- male (22)
- read speech (21)
- female (18)
- spontaneous speech (15)
- child speech (11)
- conversation (11)
- synthetic speech (8)
- MRI (8)
- speech-language pathology (8)
- real-time MRI (rtMRI) (7)
- ultrasound (7)
- interview (7)
- deepfake (6)
- video (5)
- articulatory data (5)
- inter-speaker variability (5)
- older adult (4)
- volumetric MRI (4)
- Mandarin (4)
- forensic (4)
- telephone (4)
- British (3)
- perceptually annotated (3)
- American English (3)
- electromagnetic articulography (EMA) (3)
- speech production (3)
- vowels (3)
- ultrasound tongue imaging (UTI) (3)
- French (3)
- annotated (3)
- speech sound disorder (3)
- L2 English (3)
- text-to-speech (TTS) (3)
- speech in noise (3)
- open-source (2)
- English accents (2)
- singing (2)
- angry (2)
- audiovisual (2)
- emotional speech (2)
- happy (2)
- sad (2)
- Spanish (2)
- articulation (2)
- multimodal (2)
- vocal tract shape (2)
- lip video (2)
- teaching resource (2)
- sociophonetic (2)
- pathological speech (2)
- multi-language (2)
- held vowel (2)
- map task (2)
- Australian (2)
- voice conversion (VC) (2)
- Sudanese (2)
- Nepali (2)
- Javanese (2)
- Bengali (2)
- British English (2)
- logical access (1)
- physical access (1)
- spoof (1)
- speech recognition (1)
- rainbow passage (1)
- labelled (1)
- non-speech (1)
- disgust (1)
- podcast (1)
- bilingual (1)
- child-centered audio (1)
- mother-child interaction (1)
- consonants (1)
- dentition (1)
- mandible (1)
- maxilla (1)
- International Phonetic Alphabet (IPA) (1)
- Arabic (1)
- accent variability (1)
- dialect variability (1)
- dysarthria (1)
- Derby (1)
- Leeds (1)
- Manchester (1)
- Newcastle (1)
- York (1)
- digits (1)
- whisper (1)
- Ohio (1)
- phonetic labels (1)
- Amyotrophic Lateral Sclerosis (ALS) (1)
- Down syndrome (1)
- Parkinson's disease (1)
- cerebral palsy (1)
- stroke (1)
- stutter (1)
- longitudinal (1)
- typically developing (1)
- cleft (1)
- L2 speech (1)
- language learning (1)
- DICOM (1)
- electroglottography (EGG) (1)
- intraoral pressure (1)
- validation (1)
- tenor (1)
- vibrato (1)
- Scottish English (1)
- coarticulation (1)
- within-speaker variability (1)
- brain activity (1)
- fMRI (1)
- vocal imitation (1)
- professional voice (1)
- silent speech (1)
- sociolinguistic (1)
- World Englishes (1)
- dyadic (1)
- Southern standard British English (SSBE) (1)
- Bradford (1)
- Kirklees (1)
- Wakefield (1)
- West Yorkshire (1)
- Putonghua (1)
- 3D head meshes (1)
- German (1)
- acoustic pharyngometry (1)
- electroencephalography (EEG) (1)
- external craniofacial anthropometry (1)
- rhinometry (1)
- syllable sequences (1)
- partial spoof (1)
- phone-level alignment (1)
- African (1)
- Cameroon (1)
- Chad (1)
- Congo (1)
- Gabon (1)
- Niger (1)
- Chinese (1)
- Amharic (1)
- Swahili (1)
- Wolof (1)
- Korean (1)
- Sinhala (1)
- Khmer (1)
- Afrikaans (1)
- Sesotho (1)
- Setswana (1)
- isiXhosa (1)
- Spanish accent (1)
- Czech (1)
- Japanese (1)
Resource type
- Dataset (70)
- Journal Article (3)
- Report (1)
- Web Page (4)