Your search
Results 13 resources
-
The Sociolinguistic Archive and Analysis Project, at North Carolina State University, is an interactive web-based archive of sociolinguistic recordings, with integrated media playing and annotation features, as well as phonetic analysis and corpus analysis tools designed for enabling and improving empirical linguistic inquiry. The archive continues to grow over time. It currently contains (as...
-
For over half a century, the UCLA Phonetics Laboratory has collected recordings of hundreds of languages from around the world, providing source materials for phonetic and phonological research, of value to scholars, speakers of the languages, and language learners alike. The materials on this site comprise audio recordings illustrating phonetic structures from over 200 languages with phonetic...
-
VoxAngeles is a corpus of audited phonetic transcriptions and phone-level alignments of the UCLA Phonetics Lab Archive (Ladefoged et al., 2009, http://archive.phonetics.ucla.edu/), along with phonetic measurements including word and phone durations, vowel f0 and vowel formants. The audited portion of the corpus currently contains data from 95 languages across 21 language families. Unaudited...
-
Single male native British English talker recorded producing 25 TIMIT sentences in 5 conditions, two natural: (i) quiet, (ii) while the talker listened to high-intensity speech-shaped noise, and three acted: (i) as if to a non-native listener, (ii) as if to a computer speech-recognition system, (iii) as if to an infant. Accompanied by automatic and hand-corrected phone-level transcription.
-
DECTE is an amalgamation of the existing Newcastle Electronic Corpus of Tyneside English (NECTE), created between 2001 and 2005, and NECTE2, a collection of interviews conducted in the Tyneside area since 2007. It thereby constitutes a rare example of a publicly available on-line corpus presenting dialect material spanning five decades.
-
This site allows visitors to access recordings of speakers who stutter and background details about these speakers and the conditions in which the recordings were made. The recordings are available in various formats. The main two sets of recordings were made in normal speaking conditions and the final one was made when the sound of the speaker’s voice was altered as he or she spoke. The three...
-
The current data package includes 1,090 hours of recorded speech (as .wav files) from about 1,130 participants, including those with ALS, cerebral palsy, Down syndrome, Parkinson’s disease and those who have had a stroke. The download also includes text of the original speech prompts and a transcript of the participants’ responses. A subset includes annotations describing the speech...
-
The Buckeye Corpus of conversational speech contains high-quality recordings from 40 speakers in Columbus OH conversing freely with an interviewer. The speech has been orthographically transcribed and phonetically labeled. The audio and text files, together with time-aligned phonetic labels, are stored in a format for use with speech analysis software (Xwaves and Wavesurfer).
-
Abstract The study of articulatory gestures has a wide spectrum of applications, notably in speech production and recognition. Sets of phonemes, as well as their articulation, are language-specific; however, existing MRI databases mostly include English speakers. In our present work, we introduce a dataset acquired with MRI from 10 healthy native French speakers. A corpus...
-
These transcripts and video files are samples of Spanish and English caregiver (almost always mother)-child interaction collected at child ages 2 ½, 3, and 3 ½ years as part of a 10-year longitudinal study of the language and literacy development of U.S.-born children raised in Spanish-speaking homes. Each recording is approximately 30 minutes in length. The caregiver and target child are...
-
This dataset contains 350 parallel utterances spoken by 10 native Mandarin speakers, and 10 English speakers with 5 emotional states (neutral, happy, angry, sad and surprise). The transcripts are provided.
-
The Voices Obscured in Complex Environmental Settings (VOiCES) corpus is a creative commons speech dataset targeting acoustically challenging and reverberant environments with robust labels and truth data for transcription, denoising, and speaker identification. This is one of the largest corpora to date that has transcriptions and simulatenously recorded real-world noise. The details: -...
-
VoxForge is an open speech dataset that was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac).
Explore
Audio
-
Accent/Region
(3)
- American English (2)
- British English (1)
- Child Speech (2)
- Conversation (3)
- Directed Speech (1)
- Emotional Speech (1)
- Language (8)
- Multi-Speaker (4)
- Pathological (2)
- Speech in Noise (2)
Derived & Measured Data
Speech Production & Articulation
- MRI (1)
Vocal Anatomy
- Vocal Tract (1)
Tags
- transcribed
- audio data (8)
- English (5)
- female (4)
- male (4)
- read speech (3)
- adult (3)
- child speech (2)
- American English (2)
- phonetic labels (2)
- speech-language pathology (2)
- spontaneous speech (2)
- multi-language (2)
- open-source (1)
- speech recognition (1)
- environmental noise (1)
- noisy audio (1)
- reverberation (1)
- Mandarin (1)
- angry (1)
- emotional speech (1)
- happy (1)
- sad (1)
- surprise (1)
- Spanish (1)
- bilingual (1)
- child-centered audio (1)
- mother-child interaction (1)
- French (1)
- MRI (1)
- rtMRI (1)
- volumetric MRI (1)
- Ohio (1)
- conversation (1)
- British English (1)
- Newcastle (1)
- Amyotrophic Lateral Sclerosis (ALS) (1)
- Down syndrome (1)
- Parkinson's disease (1)
- annotated (1)
- cerebral palsy (1)
- stroke (1)
- stutter (1)
- Lombard speech (1)
- clear speech (1)
- computer-directed speech (1)
- infant-directed speech (1)
- non-native-directed speech (1)
- speech in noise (1)
- formant measurement (1)
- phone duration (1)
- phone-level alignment (1)
- pitch (1)
- interview (1)
- sociolinguistic (1)
- sociophonetic (1)
Resource type
- Dataset (9)
- Journal Article (1)
- Software (1)
- Web Page (2)