Your search
Results 78 resources
-
The Fake-or-Real (FoR) dataset is a collection of more than 195,000 utterances from real humans and computer generated speech. The dataset can be used to train classifiers to detect synthetic speech. The dataset aggregates data from the latest TTS solutions (such as Deep Voice 3 and Google Wavenet TTS) as well as a variety of real human speech, including the Arctic Dataset...
-
The M-AILABS Speech Dataset is the first large dataset that we are providing free-of-charge, freely usable as training data for speech recognition and speech synthesis. Most of the data is based on LibriVox and Project Gutenberg. The training data consist of nearly thousand hours of audio and the text-files in prepared format. A transcription is provided for each clip. Clips vary in length...
-
This repository introduces: 🌀 ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts 🔥 Key Features 3000+ hours of synthetic speech Diverse Distribution Shifts: The dataset spans 7 key distribution shifts, including: 📖 Reading Style 🎙️ Podcast 🎥 YouTube 🗣️ Languages (Three different languages) 🌎 Demographics (including variations in age, accent, and gender) Multiple...
-
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems TL;DR: We show that better detection of deepfake speech from codec-based TTS systems can be achieved by training models on speech re-synthesized with neural audio codecs. This dataset is released for this purpose. See our paper and Github for more details on using our...
-
The In-the-Wild dataset contains real and synthetic speech recordings of 58 celebrities and politicians, collected from online videos. It provides a realistic benchmark for testing how well audio deepfake detection models generalize beyond laboratory data such as ASVspoof. Task: Audio Classification (Deepfake / Genuine) Languages: English Modality: Audio Size: 37.9 hours total 17.2 hours fake 20.7 hours real
-
SpeechFake is a large-scale multilingual dataset for speech deepfake detection, featuring over 3 million fake samples across 46 languages. Generated using 30 diverse open-source models* spanning text-to-speech (TTS), voice conversion or clone (VC), and neural vocoder (NV) methods, it offers rich metadata and strong coverage of modern generation techniques, enabling robust and generalizable detection research.
-
This speech corpus contains recordings for 104 monolingual native southern British English speakers aged between 8 and 85 years old while they engaged in a problem-solving picture-based ‘spot the difference’ task (Diapix) with a conversational partner in four listening conditions. In NORM (quiet, no masking), participants heard each other normally. In SPSN (speech-shaped noise), participants...
-
This collection contains the quantitative data resulting from the analysis of the elderLUCID audio corpus – a set of speech recordings collected for 83 adults aged 19 to 84 years inclusive. Recordings were made while participants carried out two types of collaborative tasks with a conversational partner who was a young adult of the same sex: (1) a ‘spot the difference’ picture task (‘diapix’)...
-
Fully-annotated corpus of spontaneous speech dialogues for children. Diapix task recorded as a stereo wav files with one speaker per channel. 96 children aged between 9 to 14 years old Non-bilingual native Southern British English speakers
-
The Nijmegen Corpus of Casual Czech contains 30 hours of high-quality recordings featuring 60 Czech speakers conversing among friends. The speech has been orthographically transcribed.
-
The Nijmegen Corpus of Casual French contains 35 hours of high-quality recordings featuring 46 French speakers conversing among friends. The speech has been orthographically annotated by professional transcribers.
-
The Nijmegen Corpus of Casual Spanish contains around 30 hours of high-quality recordings featuring 52 Spanish speakers from Madrid conversing among friends. The speech has been orthographically annotated by professional transcribers.
-
The Nijmegen Corpus of Spanish English (NCSE) contains 38.5 hours of high-quality recordings of English speech produced by 34 native Spanish speakers in interaction with two native Dutch confederates. The NCSE contains a formal and an informal recording for each Spanish speaker. The speech has been orthographically transcribed.
-
Multi-speaker TTS data for Bangladesh Bengali (bn-BD) and Indian Bengali (bn-IN).
-
Multi-speaker TTS data for four South African languages, Afrikaans, Sesotho, Setswana and isiXhosa. This data set contains multi-speaker high quality transcribed audio data for four languages of South Africa. The data set consists of wave files, and a TSV file transcribing the audio. In each folder, the file line_index.tsv contains a FileID, which in turn contains the UserID and the...
Explore
Audio Data
-
Accent/Region
(9)
- African French (1)
- American English (2)
- Arabic (1)
- British English (4)
- World Englishes (2)
- Child Speech (10)
- Conversation (15)
- Electroglottography / Electrolaryngography (1)
- Emotional Speech (3)
- Forensic (5)
-
Language
(22)
- African Languages (1)
- Arabic (1)
- Bi-/Multilingual (1)
- English (12)
- French (2)
- German (1)
- Korean (1)
- L2+ (1)
- Language Learning (2)
- Mandarin (1)
- Multiple (2)
- Spanish (1)
- Pathological (8)
- Singing (2)
- Speech in Noise (5)
- Synthetic Speech (8)
Derived & Measured Data
- Vocal Tract (1)
Speech Production Data
- Articulography (3)
- Brain Imaging (1)
- EEG (1)
- MRI (9)
- Ultrasound (10)
- Video (3)
-
Vocal Anatomy
(10)
- Larynx and Glottis (1)
- Mandible and Maxilla (1)
- Vocal Tract (8)
Teaching Resources
Tags
- audio data
- transcribed (31)
- adult (24)
- English (23)
- male (22)
- read speech (21)
- female (18)
- spontaneous speech (15)
- child speech (11)
- conversation (11)
- synthetic speech (8)
- MRI (8)
- speech-language pathology (8)
- real-time MRI (rtMRI) (7)
- ultrasound (7)
- interview (7)
- deepfake (6)
- video (5)
- articulatory data (5)
- inter-speaker variability (5)
- older adult (4)
- volumetric MRI (4)
- Mandarin (4)
- forensic (4)
- telephone (4)
- British (3)
- perceptually annotated (3)
- American English (3)
- electromagnetic articulography (EMA) (3)
- speech production (3)
- vowels (3)
- ultrasound tongue imaging (UTI) (3)
- French (3)
- annotated (3)
- speech sound disorder (3)
- L2 English (3)
- text-to-speech (TTS) (3)
- speech in noise (3)
- open-source (2)
- English accents (2)
- singing (2)
- angry (2)
- audiovisual (2)
- emotional speech (2)
- happy (2)
- sad (2)
- Spanish (2)
- articulation (2)
- multimodal (2)
- vocal tract shape (2)
- lip video (2)
- teaching resource (2)
- sociophonetic (2)
- pathological speech (2)
- multi-language (2)
- held vowel (2)
- map task (2)
- Australian (2)
- voice conversion (VC) (2)
- Sudanese (2)
- Nepali (2)
- Javanese (2)
- Bengali (2)
- British English (2)
- logical access (1)
- physical access (1)
- spoof (1)
- speech recognition (1)
- rainbow passage (1)
- labelled (1)
- non-speech (1)
- disgust (1)
- podcast (1)
- bilingual (1)
- child-centered audio (1)
- mother-child interaction (1)
- consonants (1)
- dentition (1)
- mandible (1)
- maxilla (1)
- International Phonetic Alphabet (IPA) (1)
- Arabic (1)
- accent variability (1)
- dialect variability (1)
- dysarthria (1)
- Derby (1)
- Leeds (1)
- Manchester (1)
- Newcastle (1)
- York (1)
- digits (1)
- whisper (1)
- Ohio (1)
- phonetic labels (1)
- Amyotrophic Lateral Sclerosis (ALS) (1)
- Down syndrome (1)
- Parkinson's disease (1)
- cerebral palsy (1)
- stroke (1)
- stutter (1)
- longitudinal (1)
- typically developing (1)
- cleft (1)
- L2 speech (1)
- language learning (1)
- DICOM (1)
- electroglottography (EGG) (1)
- intraoral pressure (1)
- validation (1)
- tenor (1)
- vibrato (1)
- Scottish English (1)
- coarticulation (1)
- within-speaker variability (1)
- brain activity (1)
- fMRI (1)
- vocal imitation (1)
- professional voice (1)
- silent speech (1)
- sociolinguistic (1)
- World Englishes (1)
- dyadic (1)
- Southern standard British English (SSBE) (1)
- Bradford (1)
- Kirklees (1)
- Wakefield (1)
- West Yorkshire (1)
- Putonghua (1)
- 3D head meshes (1)
- German (1)
- acoustic pharyngometry (1)
- electroencephalography (EEG) (1)
- external craniofacial anthropometry (1)
- rhinometry (1)
- syllable sequences (1)
- partial spoof (1)
- phone-level alignment (1)
- African (1)
- Cameroon (1)
- Chad (1)
- Congo (1)
- Gabon (1)
- Niger (1)
- Chinese (1)
- Amharic (1)
- Swahili (1)
- Wolof (1)
- Korean (1)
- Sinhala (1)
- Khmer (1)
- Afrikaans (1)
- Sesotho (1)
- Setswana (1)
- isiXhosa (1)
- Spanish accent (1)
- Czech (1)
- Japanese (1)
Resource type
- Dataset (70)
- Journal Article (3)
- Report (1)
- Web Page (4)