Search
Full catalogue 155 resources
-
We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totalling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark...
-
Expressive Anechoic Recordings of Speech (EARS). Highlights: - 100 h of speech data from 107 speakers - high-quality recordings at 48 kHz in an anechoic chamber - high speaker diversity with speakers from different ethnicities and age range from 18 to 75 years - full dynamic range of human speech, ranging from whispering to yelling - 18 minutes of freeform monologues per speaker - sentence...
-
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
-
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
-
SpeechBrain is an open-source PyTorch toolkit that accelerates Conversational AI development, i.e., the technology behind speech assistants, chatbots, and large language models. It is crafted for fast and easy creation of advanced technologies for Speech and Text Processing.
-
openSMILE (open-source Speech and Music Interpretation by Large-space Extraction) is a complete and open-source toolkit for audio analysis, processing and classification especially targeted at speech and music applications, e.g. automatic speech recognition, speaker identification, emotion recognition, or beat tracking and chord detection.
-
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.
-
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
-
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
-
VoiceSauce is an application, implemented in Matlab, which provides automated voice measurements over time from audio recordings. Inputs are standard wave (*.wav) files and the measures currently computed are: F0 Formants F1-F4 H1(*) H2(*) H4(*) A1(*) A2(*) A3(*) 2K(*) 5K H1(*)-H2(*) H2(*)-H4(*) H1(*)-A1(*) H1(*)-A2(*) H1(*)-A3(*) H4(*)-2K(*) 2K(*)-5K Energy Cepstral Peak Prominence Harmonic...
-
SoX is the Swiss Army Knife of sound processing utilities. It can convert audio files to other popular audio file types and also apply sound effects and filters during the conversion.
-
A complete, cross-platform solution to record, convert and stream audio and video.
-
Recordings captured by wearable microphones are a standard method for investigating young children’s language environments. A key measure to quantify from such data is the amount of speech present in children’s home environments. To this end, the LENA recorder and software—a popular system for measuring linguistic input—estimates the number of adult words that children may hear over the course...
-
Automatic LInguistic Unit Count Estimator (ALICE). ALICE is a tool for estimating the number of adult-spoken linguistic units from child-centered audio recordings, as captured by microphones worn by children. It is meant as an open-source alternative for LENA adult word count (AWC) estimator [1]. ALICE uses SylNet [2] for feature extraction and voice type classifier [3] for broad-class...
Explore
Audio Data
-
Accent/Region
(3)
- British English (2)
- World Englishes (1)
- Accents (9)
- Child Speech (11)
- Conversation (17)
- Directed Speech (1)
- Electroglottography / Electrolaryngography (1)
- Emotional Speech (5)
- Forensic (5)
-
Language
(18)
- African Languages (1)
- Bi-/Multilingual (1)
- English (11)
- French (1)
- German (1)
- Korean (1)
- L2+ (1)
- Language Learning (1)
- Mandarin (2)
- Multiple (2)
- Spanish (1)
- Pathological (9)
- Singing (3)
- Speech in Noise (7)
- Synthetic Speech (11)
Derived & Measured Data
- Formant Measurements (7)
- Fundamental Frequency (2)
- Phone-Level Alignments (1)
- Subglottal Tract (3)
- Vocal Tract (10)
- Voice Quality Measures (1)
Software, Processing & Utilities
- Feature Extraction (4)
- Image and Volume Segmentation (3)
- Numerical Acoustic Modelling (3)
- Phone Apps (1)
- Speech Processing (5)
- Transcription (3)
- Utilities (4)
Speech Perception Data
- Brain Imaging (2)
Speech Production Data
- Articulography (3)
- Brain Imaging (2)
- EEG (1)
- MRI (14)
- Ultrasound (10)
- Video (3)
-
Vocal Anatomy
(23)
- Hyoid (1)
- Larynx and Glottis (3)
- Mandible and Maxilla (3)
- Mechanical Properties (1)
- Models (2)
- Vocal Tract (13)
- X-Ray (1)
Teaching Resources
- 3D Models (2)
- Articulation Data (3)
- Tutorials (2)
- Videos (2)
Tags
- audio data (78)
- adult (41)
- transcribed (36)
- male (34)
- English (30)
- female (29)
- read speech (25)
- spontaneous speech (16)
- magnetic resonance imaging (MRI) (13)
- child speech (12)
- real-time MRI (rtMRI) (12)
- conversation (12)
- vowels (11)
- synthetic speech (11)
- formant measurement (10)
- speech-language pathology (9)
- deepfake (8)
- vocal tract shape (8)
- speech processing (7)
- video (7)
- ultrasound (7)
- interview (7)
- individual variability (7)
- teaching resource (6)
- segmentation (6)
- child (6)
- volumetric MRI (6)
- MATLAB (5)
- open-source (5)
- older adult (5)
- Mandarin (5)
- American English (5)
- articulatory data (5)
- automatic speech recognition (ASR) (4)
- speech recognition (4)
- emotional speech (4)
- speech production (4)
- annotated (4)
- British English (4)
- vocal tract area function (4)
- speech in noise (4)
- text-to-speech (TTS) (4)
- French (4)
- forensic (4)
- telephone (4)
- functional magnetic resonance imaging (fMRI) (4)
- numerical acoustic modelling (3)
- STL files (3)
- speaker diarization (3)
- audio processing (3)
- transcription (3)
- Python (3)
- spoof (3)
- English accents (3)
- singing (3)
- British (3)
- angry (3)
- happy (3)
- sad (3)
- perceptually annotated (3)
- electromagnetic articulography (EMA) (3)
- MRI (3)
- pathological speech (3)
- ultrasound tongue imaging (UTI) (3)
- Newcastle (3)
- speech sound disorder (3)
- L2 English (3)
- computed tomography (CT) (3)
- mandible (3)
- DICOM (3)
- multi-language (3)
- Japanese (3)
- source-filter model (2)
- tube model (2)
- Praat (2)
- phonetics (2)
- child-centered audio (2)
- file format conversion (2)
- feature extraction (2)
- speech to text (2)
- speech activity detection (2)
- voice activity detection (2)
- whisper (2)
- audiovisual (2)
- Spanish (2)
- International Phonetic Alphabet (IPA) (2)
- vocal tract length (2)
- subglottal tract (2)
- fundamental frequency (2)
- benchmark (2)
- glottis (2)
- videoendoscopy (2)
- phone-level alignment (2)
- finite element method (FEM) (2)
- held vowel (2)
- voice conversion (VC) (2)
- Chinese (2)
- Sudanese (2)
- Nepali (2)
- Javanese (2)
- Bengali (2)
- map task (2)
- articulation (2)
- multimodal (2)
- lip video (2)
- sociophonetic (2)
- Australian (2)
- phonetic labels (2)
- speech perception (2)
- area function (1)
- vocal fold model (1)
- 3D print (1)
- TextGrid (1)
- software (1)
- spectrogram (1)
- speech analysis (1)
- language development (1)
- language environment analysis (LENA) (1)
- word count estimation (1)
- record audio (1)
- stream audio (1)
- cepstral peak prominence (CPP) (1)
- harmonic-to-noise ratio (HNR) (1)
- C++ (1)
- classification (1)
- emotion recognition (1)
- speaker identification (1)
- conversational AI (1)
- overlapped speech detection (1)
- speaker embedding (1)
- anechoic (1)
- fast speech (1)
- high pitch (1)
- loud speech (1)
- low pitch (1)
- shout (1)
- slow speech (1)
- logical access (1)
- physical access (1)
- speaker detection (1)
- two-class recognizer (1)
- rainbow passage (1)
- labelled (1)
- non-speech (1)
- environmental noise (1)
- noisy audio (1)
- reverberation (1)
- disgust (1)
- surprise (1)
- podcast (1)
- bilingual (1)
- mother-child interaction (1)
- speech rate (1)
- syllable (1)
- syllable nuclei (1)
- speech synthesis (1)
- image processing (1)
- dysarthria (1)
- digits (1)
- Amyotrophic Lateral Sclerosis (ALS) (1)
- Down syndrome (1)
- Parkinson's disease (1)
- cerebral palsy (1)
- stroke (1)
- stutter (1)
- Non-native speech (1)
- adaptation (1)
- diapix (1)
- Middlesbrough (1)
- Sunderland (1)
- speech acoustics (1)
- longitudinal (1)
- formant tracking (1)
- anatomy (1)
- app (1)
- larynx (1)
- typically developing (1)
- cleft (1)
- x-ray (1)
- x-ray microbeam (1)
- L2 speech (1)
- language learning (1)
- electroglottography (EGG) (1)
- intraoral pressure (1)
- validation (1)
- hyoid (1)
- antiresonance (1)
- vocal tract resonance (1)
- corner vowels (1)
- developmental trajectory (1)
- sexual dimorphism (1)
- loudness (1)
- subglottal pressure (1)
- tenor (1)
- vibrato (1)
- liquids (1)
- nasals (1)
- plosives (1)
- morphometric (1)
- Lombard speech (1)
- clear speech (1)
- computer-directed speech (1)
- infant-directed speech (1)
- non-native-directed speech (1)
- Scottish English (1)
- coarticulation (1)
- within-speaker variability (1)
- phone duration (1)
- pitch (1)
- CAPE-V (1)
- GRBAS (1)
- clinical (1)
- voice quality (1)
- vocal tract transfer function (1)
- professional voice (1)
- silent speech (1)
- 3D head meshes (1)
- German (1)
- acoustic pharyngometry (1)
- electroencephalography (EEG) (1)
- external craniofacial anthropometry (1)
- rhinometry (1)
- syllable sequences (1)
- partial spoof (1)
- ASVspoof (1)
- Amharic (1)
- Swahili (1)
- Wolof (1)
- Korean (1)
- Sinhala (1)
- Khmer (1)
- Afrikaans (1)
- Sesotho (1)
- Setswana (1)
- isiXhosa (1)
- Spanish accent (1)
- Czech (1)
- Southern standard British English (SSBE) (1)
- Bradford (1)
- Kirklees (1)
- Wakefield (1)
- West Yorkshire (1)
- consonants (1)
- dentition (1)
- maxilla (1)
- Arabic (1)
- accent variability (1)
- dialect variability (1)
- Putonghua (1)
- Derby (1)
- Leeds (1)
- Manchester (1)
- York (1)
- Ohio (1)
- brain activity (1)
- vocal imitation (1)
- sociolinguistic (1)
- World Englishes (1)
- dyadic (1)
- African (1)
- Cameroon (1)
- Chad (1)
- Congo (1)
- Gabon (1)
- Niger (1)
- evolution of speech (1)
- speech motor control (1)
- anatomical measurements (1)
Resource type
- Conference Paper (1)
- Dataset (94)
- Journal Article (23)
- Preprint (2)
- Report (1)
- Software (19)
- Web Page (15)