Search
Full catalogue 155 resources
-
Participants English participants were 49 young adults (30 females, mean age=21.3, SD=3.6) with no history of psychiatric, neurological or other medical illness that might compromise cognitive functions. They self-identified as native English speakers, and strictly qualified as right-handed on the Edinburgh handedness inventory. All participants were paid, and gave written informed consent...
-
The InterTVA dataset has been acquired with two main objectives. First, from a neuroscientific perspective, it aims at studying the inter-individual differences observed in people's ability at performing voice perception and voice identification tasks. Secondly, from a methodological perspective, it should allow benchmarking multi-view machine learning methods. Indeed, it includes several MRI...
-
Introduction The morphology of the vocal tract plays a crucial role in singing. Adjustments of the lower part of the vocal tract are essential for voice quality and timbre. Structured investigations of this region are challenging due to the small extent of the morphological modifications. Material and methods This study analyzed the morphology of the endolaryngeal tube and parts of the...
-
A prominent model of the origins of speech, known as the “frame/content” theory, posits that oscillatory lowering and raising of the jaw provided an evolutionary scaffold for the development of syllable structure in speech. Because such oscillations are non‐vocal in most non‐human primates, the evolution of speech required the addition of vocalization onto this scaffold in order to turn such...
-
The Fake-or-Real (FoR) dataset is a collection of more than 195,000 utterances from real humans and computer generated speech. The dataset can be used to train classifiers to detect synthetic speech. The dataset aggregates data from the latest TTS solutions (such as Deep Voice 3 and Google Wavenet TTS) as well as a variety of real human speech, including the Arctic Dataset...
-
The M-AILABS Speech Dataset is the first large dataset that we are providing free-of-charge, freely usable as training data for speech recognition and speech synthesis. Most of the data is based on LibriVox and Project Gutenberg. The training data consist of nearly thousand hours of audio and the text-files in prepared format. A transcription is provided for each clip. Clips vary in length...
-
We present the MLAAD dataset, which is a multi-language dataset for the task of audio anti-spoofing. This dataset has been created using a diverse set of text-to-speech (TTS) models, and is designed to evaluate the out-of-domain generalization of anti-spoofing systems, both with respect to new languages, as well as new TTS models. Specifically, MLAAD comprises: 678.3 hours of synthetic...
-
This repository introduces: 🌀 ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts 🔥 Key Features 3000+ hours of synthetic speech Diverse Distribution Shifts: The dataset spans 7 key distribution shifts, including: 📖 Reading Style 🎙️ Podcast 🎥 YouTube 🗣️ Languages (Three different languages) 🌎 Demographics (including variations in age, accent, and gender) Multiple...
-
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems TL;DR: We show that better detection of deepfake speech from codec-based TTS systems can be achieved by training models on speech re-synthesized with neural audio codecs. This dataset is released for this purpose. See our paper and Github for more details on using our...
-
The In-the-Wild dataset contains real and synthetic speech recordings of 58 celebrities and politicians, collected from online videos. It provides a realistic benchmark for testing how well audio deepfake detection models generalize beyond laboratory data such as ASVspoof. Task: Audio Classification (Deepfake / Genuine) Languages: English Modality: Audio Size: 37.9 hours total 17.2 hours fake 20.7 hours real
-
SpeechFake is a large-scale multilingual dataset for speech deepfake detection, featuring over 3 million fake samples across 46 languages. Generated using 30 diverse open-source models* spanning text-to-speech (TTS), voice conversion or clone (VC), and neural vocoder (NV) methods, it offers rich metadata and strong coverage of modern generation techniques, enabling robust and generalizable detection research.
-
This speech corpus contains recordings for 104 monolingual native southern British English speakers aged between 8 and 85 years old while they engaged in a problem-solving picture-based ‘spot the difference’ task (Diapix) with a conversational partner in four listening conditions. In NORM (quiet, no masking), participants heard each other normally. In SPSN (speech-shaped noise), participants...
-
This collection contains the quantitative data resulting from the analysis of the elderLUCID audio corpus – a set of speech recordings collected for 83 adults aged 19 to 84 years inclusive. Recordings were made while participants carried out two types of collaborative tasks with a conversational partner who was a young adult of the same sex: (1) a ‘spot the difference’ picture task (‘diapix’)...
Explore
Audio Data
-
Accent/Region
(3)
- British English (2)
- World Englishes (1)
- Accents (9)
- Child Speech (11)
- Conversation (17)
- Directed Speech (1)
- Electroglottography / Electrolaryngography (1)
- Emotional Speech (5)
- Forensic (5)
-
Language
(18)
- African Languages (1)
- Bi-/Multilingual (1)
- English (11)
- French (1)
- German (1)
- Korean (1)
- L2+ (1)
- Language Learning (1)
- Mandarin (2)
- Multiple (2)
- Spanish (1)
- Pathological (9)
- Singing (3)
- Speech in Noise (7)
- Synthetic Speech (11)
Derived & Measured Data
- Formant Measurements (7)
- Fundamental Frequency (2)
- Phone-Level Alignments (1)
- Subglottal Tract (3)
- Vocal Tract (10)
- Voice Quality Measures (1)
Software, Processing & Utilities
- Feature Extraction (4)
- Image and Volume Segmentation (3)
- Numerical Acoustic Modelling (3)
- Phone Apps (1)
- Speech Processing (5)
- Transcription (3)
- Utilities (4)
Speech Perception Data
- Brain Imaging (2)
Speech Production Data
- Articulography (3)
- Brain Imaging (2)
- EEG (1)
- MRI (14)
- Ultrasound (10)
- Video (3)
-
Vocal Anatomy
(23)
- Hyoid (1)
- Larynx and Glottis (3)
- Mandible and Maxilla (3)
- Mechanical Properties (1)
- Models (2)
- Vocal Tract (13)
- X-Ray (1)
Teaching Resources
- 3D Models (2)
- Articulation Data (3)
- Tutorials (2)
- Videos (2)
Tags
- audio data (78)
- adult (41)
- transcribed (36)
- male (34)
- English (30)
- female (29)
- read speech (25)
- spontaneous speech (16)
- magnetic resonance imaging (MRI) (13)
- child speech (12)
- real-time MRI (rtMRI) (12)
- conversation (12)
- vowels (11)
- synthetic speech (11)
- formant measurement (10)
- speech-language pathology (9)
- deepfake (8)
- vocal tract shape (8)
- speech processing (7)
- video (7)
- ultrasound (7)
- interview (7)
- individual variability (7)
- teaching resource (6)
- segmentation (6)
- child (6)
- volumetric MRI (6)
- MATLAB (5)
- open-source (5)
- older adult (5)
- Mandarin (5)
- American English (5)
- articulatory data (5)
- automatic speech recognition (ASR) (4)
- speech recognition (4)
- emotional speech (4)
- speech production (4)
- annotated (4)
- British English (4)
- vocal tract area function (4)
- speech in noise (4)
- text-to-speech (TTS) (4)
- French (4)
- forensic (4)
- telephone (4)
- functional magnetic resonance imaging (fMRI) (4)
- numerical acoustic modelling (3)
- STL files (3)
- speaker diarization (3)
- audio processing (3)
- transcription (3)
- Python (3)
- spoof (3)
- English accents (3)
- singing (3)
- British (3)
- angry (3)
- happy (3)
- sad (3)
- perceptually annotated (3)
- electromagnetic articulography (EMA) (3)
- MRI (3)
- pathological speech (3)
- ultrasound tongue imaging (UTI) (3)
- Newcastle (3)
- speech sound disorder (3)
- L2 English (3)
- computed tomography (CT) (3)
- mandible (3)
- DICOM (3)
- multi-language (3)
- Japanese (3)
- source-filter model (2)
- tube model (2)
- Praat (2)
- phonetics (2)
- child-centered audio (2)
- file format conversion (2)
- feature extraction (2)
- speech to text (2)
- speech activity detection (2)
- voice activity detection (2)
- whisper (2)
- audiovisual (2)
- Spanish (2)
- International Phonetic Alphabet (IPA) (2)
- vocal tract length (2)
- subglottal tract (2)
- fundamental frequency (2)
- benchmark (2)
- glottis (2)
- videoendoscopy (2)
- phone-level alignment (2)
- finite element method (FEM) (2)
- held vowel (2)
- voice conversion (VC) (2)
- Chinese (2)
- Sudanese (2)
- Nepali (2)
- Javanese (2)
- Bengali (2)
- map task (2)
- articulation (2)
- multimodal (2)
- lip video (2)
- sociophonetic (2)
- Australian (2)
- phonetic labels (2)
- speech perception (2)
- area function (1)
- vocal fold model (1)
- 3D print (1)
- TextGrid (1)
- software (1)
- spectrogram (1)
- speech analysis (1)
- language development (1)
- language environment analysis (LENA) (1)
- word count estimation (1)
- record audio (1)
- stream audio (1)
- cepstral peak prominence (CPP) (1)
- harmonic-to-noise ratio (HNR) (1)
- C++ (1)
- classification (1)
- emotion recognition (1)
- speaker identification (1)
- conversational AI (1)
- overlapped speech detection (1)
- speaker embedding (1)
- anechoic (1)
- fast speech (1)
- high pitch (1)
- loud speech (1)
- low pitch (1)
- shout (1)
- slow speech (1)
- logical access (1)
- physical access (1)
- speaker detection (1)
- two-class recognizer (1)
- rainbow passage (1)
- labelled (1)
- non-speech (1)
- environmental noise (1)
- noisy audio (1)
- reverberation (1)
- disgust (1)
- surprise (1)
- podcast (1)
- bilingual (1)
- mother-child interaction (1)
- speech rate (1)
- syllable (1)
- syllable nuclei (1)
- speech synthesis (1)
- image processing (1)
- dysarthria (1)
- digits (1)
- Amyotrophic Lateral Sclerosis (ALS) (1)
- Down syndrome (1)
- Parkinson's disease (1)
- cerebral palsy (1)
- stroke (1)
- stutter (1)
- Non-native speech (1)
- adaptation (1)
- diapix (1)
- Middlesbrough (1)
- Sunderland (1)
- speech acoustics (1)
- longitudinal (1)
- formant tracking (1)
- anatomy (1)
- app (1)
- larynx (1)
- typically developing (1)
- cleft (1)
- x-ray (1)
- x-ray microbeam (1)
- L2 speech (1)
- language learning (1)
- electroglottography (EGG) (1)
- intraoral pressure (1)
- validation (1)
- hyoid (1)
- antiresonance (1)
- vocal tract resonance (1)
- corner vowels (1)
- developmental trajectory (1)
- sexual dimorphism (1)
- loudness (1)
- subglottal pressure (1)
- tenor (1)
- vibrato (1)
- liquids (1)
- nasals (1)
- plosives (1)
- morphometric (1)
- Lombard speech (1)
- clear speech (1)
- computer-directed speech (1)
- infant-directed speech (1)
- non-native-directed speech (1)
- Scottish English (1)
- coarticulation (1)
- within-speaker variability (1)
- phone duration (1)
- pitch (1)
- CAPE-V (1)
- GRBAS (1)
- clinical (1)
- voice quality (1)
- vocal tract transfer function (1)
- professional voice (1)
- silent speech (1)
- 3D head meshes (1)
- German (1)
- acoustic pharyngometry (1)
- electroencephalography (EEG) (1)
- external craniofacial anthropometry (1)
- rhinometry (1)
- syllable sequences (1)
- partial spoof (1)
- ASVspoof (1)
- Amharic (1)
- Swahili (1)
- Wolof (1)
- Korean (1)
- Sinhala (1)
- Khmer (1)
- Afrikaans (1)
- Sesotho (1)
- Setswana (1)
- isiXhosa (1)
- Spanish accent (1)
- Czech (1)
- Southern standard British English (SSBE) (1)
- Bradford (1)
- Kirklees (1)
- Wakefield (1)
- West Yorkshire (1)
- consonants (1)
- dentition (1)
- maxilla (1)
- Arabic (1)
- accent variability (1)
- dialect variability (1)
- Putonghua (1)
- Derby (1)
- Leeds (1)
- Manchester (1)
- York (1)
- Ohio (1)
- brain activity (1)
- vocal imitation (1)
- sociolinguistic (1)
- World Englishes (1)
- dyadic (1)
- African (1)
- Cameroon (1)
- Chad (1)
- Congo (1)
- Gabon (1)
- Niger (1)
- evolution of speech (1)
- speech motor control (1)
- anatomical measurements (1)
Resource type
- Conference Paper (1)
- Dataset (94)
- Journal Article (23)
- Preprint (2)
- Report (1)
- Software (19)
- Web Page (15)