Results | YorVoice Catalogue

LifeLUCID Corpus: Recordings of Speakers Aged 8 to 85 Years Engaged in Interactive Task in the Presence of Energetic and Informational Masking, 2017-2020

Outi Tuomainen, Linda Taschenberger, Valerie Hazan

This speech corpus contains recordings for 104 monolingual native southern British English speakers aged between 8 and 85 years old while they engaged in a problem-solving picture-based ‘spot the difference’ task (Diapix) with a conversational partner in four listening conditions. In NORM (quiet, no masking), participants heard each other normally. In SPSN (speech-shaped noise), participants...

View on reshare.ukdataservice.ac.uk

elderLUCID: London UCL Older Adults' clear speech in interaction database

Valerie Hazan, Outi Tuomainen, Jeesun Kim + 1 others

This collection contains the quantitative data resulting from the analysis of the elderLUCID audio corpus – a set of speech recordings collected for 83 adults aged 19 to 84 years inclusive. Recordings were made while participants carried out two types of collaborative tasks with a conversational partner who was a young adult of the same sex: (1) a ‘spot the difference’ picture task (‘diapix’)...

View on reshare.ukdataservice.ac.uk

kidLUCID

Fully-annotated corpus of spontaneous speech dialogues for children. Diapix task recorded as a stereo wav files with one speaker per channel. 96 children aged between 9 to 14 years old Non-bilingual native Southern British English speakers

View on speechbox.linguistics.northwestern.edu

Nijmegen Corpus of Casual Czech

The Nijmegen Corpus of Casual Czech contains 30 hours of high-quality recordings featuring 60 Czech speakers conversing among friends. The speech has been orthographically transcribed.

View on mirjamernestus.nl

Nijmegen Corpus of Casual French

The Nijmegen Corpus of Casual French contains 35 hours of high-quality recordings featuring 46 French speakers conversing among friends. The speech has been orthographically annotated by professional transcribers.

View on mirjamernestus.nl

Nijmegen Corpus of Casual Spanish

The Nijmegen Corpus of Casual Spanish contains around 30 hours of high-quality recordings featuring 52 Spanish speakers from Madrid conversing among friends. The speech has been orthographically annotated by professional transcribers.

View on mirjamernestus.nl

The Edinburgh International Accents of English Corpus

Ramon Sanabria, Nina Markl, Andrea Carmantini + 4 others

English is the most widely spoken language in the world, used daily by millions of people as a first or second language in many different contexts. As a result, there are many varieties of English. Although the great many advances in English automatic speech recognition (ASR) over the past decades, results are usually reported based on test datasets which fail to represent the diversity of...

View on datashare.ed.ac.uk

The Sociolinguistic Archive and Analysis Project (SLAAP)

Tyler Kendall

The Sociolinguistic Archive and Analysis Project, at North Carolina State University, is an interactive web-based archive of sociolinguistic recordings, with integrated media playing and annotation features, as well as phonetic analysis and corpus analysis tools designed for enabling and improving empirical linguistic inquiry. The archive continues to grow over time. It currently contains (as...

View on slaap.chass.ncsu.edu

The Tongue and Lips Corpus

M. S. Ribeiro, J. Sanger, J.-X. Zhang + 4 others

A multi-speaker corpus of ultrasound images of the tongue and video images of the lips The Tongue and Lips (TaL) corpus is a multi-speaker corpus of ultrasound images of the tongue and video images of lips. This corpus contains synchronised imaging data of extraoral (lips) and intraoral (tongue) articulators from 82 native speakers of English. The TaL corpus consists of two datasets: - TaL1...

View on ultrasuite.github.io

Speak & Improve Corpus 2025: an L2 English Speech Corpus for Language Assessment and Feedback

Katherine Knill, Diane Nicholls, Mark Gales + 4 others

We introduce the Speak & Improve Corpus 2025, a dataset of L2 learner English data with holistic scores and language error annotation, collected from open (spontaneous) speaking tests on the Speak & Improve learning platform. The aim of the corpus release is to address a major challenge to developing L2 spoken language processing systems, the lack of publicly available data with high-quality...

View on www.repository.cam.ac.uk

Speech Accessibility Project

The current data package includes 1,090 hours of recorded speech (as .wav files) from about 1,130 participants, including those with ALS, cerebral palsy, Down syndrome, Parkinson’s disease and those who have had a stroke. The download also includes text of the original speech prompts and a transcript of the participants’ responses. A subset includes annotations describing the speech...

View on speechaccessibilityproject.beckman.illinois.edu

Audiovisual Whisper (AVW) Corpus

The MSP-AVW is an audiovisual whisper corpus for audiovisual speech recognition purpose. The MSP-AVW corpus contains data from 20 female and 20 male speakers. For each subject, three sessions are recorded consisting of read sentences, isolated digits and spontaneous speech. The data is recorded under neutral and whisper conditions. The corpus was collected in a 13ft x 13ft ASHA certified...

View on ecs.utdallas.edu

Forensic Voice Comparison Databases: AusEng 500+

G. S. Morrison, C. Zhang, E. Enzinger + 8 others

Forensic database of voice recordings of 500+ Australian English speakers (AusEng 500+). This database contains 3899 recordings totalling 310 hours of speech from 555 Australian-English speakers. 324 female speakers: - 91 recorded in one recording session - 69 recorded in two separate recording sessions - 159 recorded in three recording sessions - 5 recorded in more than three recording...

View on forensic-voice-comparison.net

Intonational variation in Arabic Corpus

Sam Hellmuth, Rana Almbark

Twenty five countries have Arabic as an official language, but the dialects spoken vary greatly, and even within one country different accents are heard. Many features create the impression of 'a different accent', including how particular sounds are pronounced, where stress falls in a word, and what intonation pattern is used. There is extensive prior research on the first two of these for...

View on reshare.ukdataservice.ac.uk

Dynamic Dialects

E. Lawson, J. Stuart-Smith, J. M. Scobbie + 1 others

Dynamic Dialects contains an articulatory video-based corpus of speech samples from world-wide accents of English. Videos in this corpus contain synchronised audio, ultrasound-tongue-imaging video and video of the moving lips. We are continuing to augment the database. The website contains three main resources: - A clickable Accent Map: clicking on points of the map will open up links to...

View on www.dynamicdialects.ac.uk

Your search

Results 16 resources

Explore

Audio Data

Speech Production Data

Teaching Resources

Tags

Resource type