Search
Full catalogue 155 resources
-
Forensic database of voice recordings of 500+ Australian English speakers (AusEng 500+). This database contains 3899 recordings totalling 310 hours of speech from 555 Australian-English speakers. 324 female speakers: - 91 recorded in one recording session - 69 recorded in two separate recording sessions - 159 recorded in three recording sessions - 5 recorded in more than three recording...
-
The MSP-Conversation corpus contains interactions annotated with time-continuous emotional traces for arousal (calm to active), valence (negative to positive), and dominance (weak to strong). Time-continuous annotations offer the flexibility to explore emotional displays at different temporal resolutions while leveraging contextual information. Release 1.0 contains 74 conversations with...
-
Large-scale, weakly-supervised speech recognition models, such as Whisper, have demonstrated impressive results on speech recognition across domains and languages. However, their application to long audio transcription via buffered or sliding window approaches is prone to drifting, hallucination and repetition; and prohibits batched transcription due to their sequential nature. Further,...
-
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero-shot transfer setting without the need for any...
-
In this paper, we first provide a review of the state-of-the-art emotional voice conversion research, and the existing emotional speech databases. We then motivate the development of a novel emotional speech database (ESD) that addresses the increasing research need. With this paper, the ESD database1 is now made available to the research community. The ESD database consists of 350 parallel...
-
Twenty five countries have Arabic as an official language, but the dialects spoken vary greatly, and even within one country different accents are heard. Many features create the impression of 'a different accent', including how particular sounds are pronounced, where stress falls in a word, and what intonation pattern is used. There is extensive prior research on the first two of these for...
-
Hi – my name is Simon King and this is my personal website for supporting my teaching. I am the Professor of Speech Processing at the University of Edinburgh, where I teach courses in speech processing and speech synthesis at advanced undergraduate and Masters level. Use of this website Students: You may use this website freely for personal use. You may download copies of the content for your...
-
Welcome to our interactive International Phonetic Association (IPA) chart website! Clicking on the IPA symbols on our charts will allow you to listen to their sounds and see vocal-organ movements imaged with ultrasound, MRI, or in animated form. To find out more about how our IPA charts were made, click on the buttons on the left-hand side of this page. The website contains two main...
-
Dynamic Dialects contains an articulatory video-based corpus of speech samples from world-wide accents of English. Videos in this corpus contain synchronised audio, ultrasound-tongue-imaging video and video of the moving lips. We are continuing to augment the database. The website contains three main resources: - A clickable Accent Map: clicking on points of the map will open up links to...
-
This is a corpus of articulatory data of different forms (EMA, MRI, video, 3D scans of upper/lower jaw, audio etc.) acquired from one male British English speaker.
-
The USC Speech and Vocal Tract Morphology MRI Database consists of real-time magnetic resonance images of dynamic vocal tract shaping during read and spontaneous speech with concurrently recorded denoised audio, and 3D volumetric MRI of vocal tract shapes during vowels and continuant consonants sustained for 7 seconds, from 17 speakers.
-
USC-EMO-MRI is an emotional speech production database which includes real-time magnetic resonance imaging data with synchronized speech audio from five male and five female actors, each producing a passage and a set of sentences in multiple repetitions, while enacting four different target emotions (neutral, happy, angry, sad). The database includes emotion quality evaluation from at least...
-
USC-TIMIT is a database of speech production data under ongoing development, which currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English, and electromagnetic articulography data from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus. In...
-
We have been collecting real-time MRI data from phoneticians producing the sounds of the International Phonetic Alphabet, together with standard sentences and texts. You may access the collected data by clicking on the pictures below.
-
Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving...
Explore
Audio Data
-
Accent/Region
(3)
- British English (2)
- World Englishes (1)
- Accents (9)
- Child Speech (11)
- Conversation (17)
- Directed Speech (1)
- Electroglottography / Electrolaryngography (1)
- Emotional Speech (5)
- Forensic (5)
-
Language
(18)
- African Languages (1)
- Bi-/Multilingual (1)
- English (11)
- French (1)
- German (1)
- Korean (1)
- L2+ (1)
- Language Learning (1)
- Mandarin (2)
- Multiple (2)
- Spanish (1)
- Pathological (9)
- Singing (3)
- Speech in Noise (7)
- Synthetic Speech (11)
Derived & Measured Data
- Formant Measurements (7)
- Fundamental Frequency (2)
- Phone-Level Alignments (1)
- Subglottal Tract (3)
- Vocal Tract (10)
- Voice Quality Measures (1)
Software, Processing & Utilities
- Feature Extraction (4)
- Image and Volume Segmentation (3)
- Numerical Acoustic Modelling (3)
- Phone Apps (1)
- Speech Processing (5)
- Transcription (3)
- Utilities (4)
Speech Perception Data
- Brain Imaging (2)
Speech Production Data
- Articulography (3)
- Brain Imaging (2)
- EEG (1)
- MRI (14)
- Ultrasound (10)
- Video (3)
-
Vocal Anatomy
(23)
- Hyoid (1)
- Larynx and Glottis (3)
- Mandible and Maxilla (3)
- Mechanical Properties (1)
- Models (2)
- Vocal Tract (13)
- X-Ray (1)
Teaching Resources
- 3D Models (2)
- Articulation Data (3)
- Tutorials (2)
- Videos (2)
Tags
- audio data (78)
- adult (41)
- transcribed (36)
- male (34)
- English (30)
- female (29)
- read speech (25)
- spontaneous speech (16)
- magnetic resonance imaging (MRI) (13)
- child speech (12)
- real-time MRI (rtMRI) (12)
- conversation (12)
- vowels (11)
- synthetic speech (11)
- formant measurement (10)
- speech-language pathology (9)
- deepfake (8)
- vocal tract shape (8)
- speech processing (7)
- video (7)
- ultrasound (7)
- interview (7)
- individual variability (7)
- teaching resource (6)
- segmentation (6)
- child (6)
- volumetric MRI (6)
- MATLAB (5)
- open-source (5)
- older adult (5)
- Mandarin (5)
- American English (5)
- articulatory data (5)
- automatic speech recognition (ASR) (4)
- speech recognition (4)
- emotional speech (4)
- speech production (4)
- annotated (4)
- British English (4)
- vocal tract area function (4)
- speech in noise (4)
- text-to-speech (TTS) (4)
- French (4)
- forensic (4)
- telephone (4)
- functional magnetic resonance imaging (fMRI) (4)
- numerical acoustic modelling (3)
- STL files (3)
- speaker diarization (3)
- audio processing (3)
- transcription (3)
- Python (3)
- spoof (3)
- English accents (3)
- singing (3)
- British (3)
- angry (3)
- happy (3)
- sad (3)
- perceptually annotated (3)
- electromagnetic articulography (EMA) (3)
- MRI (3)
- pathological speech (3)
- ultrasound tongue imaging (UTI) (3)
- Newcastle (3)
- speech sound disorder (3)
- L2 English (3)
- computed tomography (CT) (3)
- mandible (3)
- DICOM (3)
- multi-language (3)
- Japanese (3)
- source-filter model (2)
- tube model (2)
- Praat (2)
- phonetics (2)
- child-centered audio (2)
- file format conversion (2)
- feature extraction (2)
- speech to text (2)
- speech activity detection (2)
- voice activity detection (2)
- whisper (2)
- audiovisual (2)
- Spanish (2)
- International Phonetic Alphabet (IPA) (2)
- vocal tract length (2)
- subglottal tract (2)
- fundamental frequency (2)
- benchmark (2)
- glottis (2)
- videoendoscopy (2)
- phone-level alignment (2)
- finite element method (FEM) (2)
- held vowel (2)
- voice conversion (VC) (2)
- Chinese (2)
- Sudanese (2)
- Nepali (2)
- Javanese (2)
- Bengali (2)
- map task (2)
- articulation (2)
- multimodal (2)
- lip video (2)
- sociophonetic (2)
- Australian (2)
- phonetic labels (2)
- speech perception (2)
- area function (1)
- vocal fold model (1)
- 3D print (1)
- TextGrid (1)
- software (1)
- spectrogram (1)
- speech analysis (1)
- language development (1)
- language environment analysis (LENA) (1)
- word count estimation (1)
- record audio (1)
- stream audio (1)
- cepstral peak prominence (CPP) (1)
- harmonic-to-noise ratio (HNR) (1)
- C++ (1)
- classification (1)
- emotion recognition (1)
- speaker identification (1)
- conversational AI (1)
- overlapped speech detection (1)
- speaker embedding (1)
- anechoic (1)
- fast speech (1)
- high pitch (1)
- loud speech (1)
- low pitch (1)
- shout (1)
- slow speech (1)
- logical access (1)
- physical access (1)
- speaker detection (1)
- two-class recognizer (1)
- rainbow passage (1)
- labelled (1)
- non-speech (1)
- environmental noise (1)
- noisy audio (1)
- reverberation (1)
- disgust (1)
- surprise (1)
- podcast (1)
- bilingual (1)
- mother-child interaction (1)
- speech rate (1)
- syllable (1)
- syllable nuclei (1)
- speech synthesis (1)
- image processing (1)
- dysarthria (1)
- digits (1)
- Amyotrophic Lateral Sclerosis (ALS) (1)
- Down syndrome (1)
- Parkinson's disease (1)
- cerebral palsy (1)
- stroke (1)
- stutter (1)
- Non-native speech (1)
- adaptation (1)
- diapix (1)
- Middlesbrough (1)
- Sunderland (1)
- speech acoustics (1)
- longitudinal (1)
- formant tracking (1)
- anatomy (1)
- app (1)
- larynx (1)
- typically developing (1)
- cleft (1)
- x-ray (1)
- x-ray microbeam (1)
- L2 speech (1)
- language learning (1)
- electroglottography (EGG) (1)
- intraoral pressure (1)
- validation (1)
- hyoid (1)
- antiresonance (1)
- vocal tract resonance (1)
- corner vowels (1)
- developmental trajectory (1)
- sexual dimorphism (1)
- loudness (1)
- subglottal pressure (1)
- tenor (1)
- vibrato (1)
- liquids (1)
- nasals (1)
- plosives (1)
- morphometric (1)
- Lombard speech (1)
- clear speech (1)
- computer-directed speech (1)
- infant-directed speech (1)
- non-native-directed speech (1)
- Scottish English (1)
- coarticulation (1)
- within-speaker variability (1)
- phone duration (1)
- pitch (1)
- CAPE-V (1)
- GRBAS (1)
- clinical (1)
- voice quality (1)
- vocal tract transfer function (1)
- professional voice (1)
- silent speech (1)
- 3D head meshes (1)
- German (1)
- acoustic pharyngometry (1)
- electroencephalography (EEG) (1)
- external craniofacial anthropometry (1)
- rhinometry (1)
- syllable sequences (1)
- partial spoof (1)
- ASVspoof (1)
- Amharic (1)
- Swahili (1)
- Wolof (1)
- Korean (1)
- Sinhala (1)
- Khmer (1)
- Afrikaans (1)
- Sesotho (1)
- Setswana (1)
- isiXhosa (1)
- Spanish accent (1)
- Czech (1)
- Southern standard British English (SSBE) (1)
- Bradford (1)
- Kirklees (1)
- Wakefield (1)
- West Yorkshire (1)
- consonants (1)
- dentition (1)
- maxilla (1)
- Arabic (1)
- accent variability (1)
- dialect variability (1)
- Putonghua (1)
- Derby (1)
- Leeds (1)
- Manchester (1)
- York (1)
- Ohio (1)
- brain activity (1)
- vocal imitation (1)
- sociolinguistic (1)
- World Englishes (1)
- dyadic (1)
- African (1)
- Cameroon (1)
- Chad (1)
- Congo (1)
- Gabon (1)
- Niger (1)
- evolution of speech (1)
- speech motor control (1)
- anatomical measurements (1)
Resource type
- Conference Paper (1)
- Dataset (94)
- Journal Article (23)
- Preprint (2)
- Report (1)
- Software (19)
- Web Page (15)