Your search
Results 16 resources
-
This speech corpus contains recordings for 104 monolingual native southern British English speakers aged between 8 and 85 years old while they engaged in a problem-solving picture-based ‘spot the difference’ task (Diapix) with a conversational partner in four listening conditions. In NORM (quiet, no masking), participants heard each other normally. In SPSN (speech-shaped noise), participants...
-
This collection contains the quantitative data resulting from the analysis of the elderLUCID audio corpus – a set of speech recordings collected for 83 adults aged 19 to 84 years inclusive. Recordings were made while participants carried out two types of collaborative tasks with a conversational partner who was a young adult of the same sex: (1) a ‘spot the difference’ picture task (‘diapix’)...
-
Fully-annotated corpus of spontaneous speech dialogues for children. Diapix task recorded as a stereo wav files with one speaker per channel. 96 children aged between 9 to 14 years old Non-bilingual native Southern British English speakers
-
The Nijmegen Corpus of Casual Czech contains 30 hours of high-quality recordings featuring 60 Czech speakers conversing among friends. The speech has been orthographically transcribed.
-
The Nijmegen Corpus of Casual French contains 35 hours of high-quality recordings featuring 46 French speakers conversing among friends. The speech has been orthographically annotated by professional transcribers.
-
The Nijmegen Corpus of Casual Spanish contains around 30 hours of high-quality recordings featuring 52 Spanish speakers from Madrid conversing among friends. The speech has been orthographically annotated by professional transcribers.
-
English is the most widely spoken language in the world, used daily by millions of people as a first or second language in many different contexts. As a result, there are many varieties of English. Although the great many advances in English automatic speech recognition (ASR) over the past decades, results are usually reported based on test datasets which fail to represent the diversity of...
-
The Sociolinguistic Archive and Analysis Project, at North Carolina State University, is an interactive web-based archive of sociolinguistic recordings, with integrated media playing and annotation features, as well as phonetic analysis and corpus analysis tools designed for enabling and improving empirical linguistic inquiry. The archive continues to grow over time. It currently contains (as...
-
A multi-speaker corpus of ultrasound images of the tongue and video images of the lips The Tongue and Lips (TaL) corpus is a multi-speaker corpus of ultrasound images of the tongue and video images of lips. This corpus contains synchronised imaging data of extraoral (lips) and intraoral (tongue) articulators from 82 native speakers of English. The TaL corpus consists of two datasets: - TaL1...
-
We introduce the Speak & Improve Corpus 2025, a dataset of L2 learner English data with holistic scores and language error annotation, collected from open (spontaneous) speaking tests on the Speak & Improve learning platform. The aim of the corpus release is to address a major challenge to developing L2 spoken language processing systems, the lack of publicly available data with high-quality...
-
The current data package includes 1,090 hours of recorded speech (as .wav files) from about 1,130 participants, including those with ALS, cerebral palsy, Down syndrome, Parkinson’s disease and those who have had a stroke. The download also includes text of the original speech prompts and a transcript of the participants’ responses. A subset includes annotations describing the speech...
-
The MSP-AVW is an audiovisual whisper corpus for audiovisual speech recognition purpose. The MSP-AVW corpus contains data from 20 female and 20 male speakers. For each subject, three sessions are recorded consisting of read sentences, isolated digits and spontaneous speech. The data is recorded under neutral and whisper conditions. The corpus was collected in a 13ft x 13ft ASHA certified...
-
Forensic database of voice recordings of 500+ Australian English speakers (AusEng 500+). This database contains 3899 recordings totalling 310 hours of speech from 555 Australian-English speakers. 324 female speakers: - 91 recorded in one recording session - 69 recorded in two separate recording sessions - 159 recorded in three recording sessions - 5 recorded in more than three recording...
-
Twenty five countries have Arabic as an official language, but the dialects spoken vary greatly, and even within one country different accents are heard. Many features create the impression of 'a different accent', including how particular sounds are pronounced, where stress falls in a word, and what intonation pattern is used. There is extensive prior research on the first two of these for...
-
Dynamic Dialects contains an articulatory video-based corpus of speech samples from world-wide accents of English. Videos in this corpus contain synchronised audio, ultrasound-tongue-imaging video and video of the moving lips. We are continuing to augment the database. The website contains three main resources: - A clickable Accent Map: clicking on points of the map will open up links to...
Explore
Audio Data
- Accents (4)
- Child Speech (2)
- Conversation (8)
- Emotional Speech (1)
- Forensic (1)
-
Language
(2)
- English (2)
- L2+ (1)
- Language Learning (1)
- Pathological (1)
- Speech in Noise (4)
Speech Production Data
- Ultrasound (2)
- Video (2)
Teaching Resources
Tags
- spontaneous speech
- audio data (15)
- read speech (8)
- transcribed (8)
- conversation (8)
- English (7)
- adult (4)
- video (3)
- interview (3)
- older adult (3)
- speech in noise (3)
- whisper (2)
- female (2)
- male (2)
- annotated (2)
- L2 English (2)
- British English (2)
- child speech (2)
- sociophonetic (2)
- anechoic (1)
- emotional speech (1)
- fast speech (1)
- high pitch (1)
- loud speech (1)
- low pitch (1)
- shout (1)
- slow speech (1)
- audiovisual (1)
- digits (1)
- Amyotrophic Lateral Sclerosis (ALS) (1)
- Down syndrome (1)
- Parkinson's disease (1)
- cerebral palsy (1)
- speech-language pathology (1)
- stroke (1)
- L2 speech (1)
- language learning (1)
- professional voice (1)
- silent speech (1)
- ultrasound (1)
- Czech (1)
- French (1)
- Spanish (1)
- lip video (1)
- teaching resource (1)
- ultrasound tongue imaging (UTI) (1)
- Arabic (1)
- accent variability (1)
- dialect variability (1)
- Australian (1)
- forensic (1)
- individual variability (1)
- American English (1)
- sociolinguistic (1)
- World Englishes (1)
- dyadic (1)