Your search
Results 7 resources
-
We introduce the Speak & Improve Corpus 2025, a dataset of L2 learner English data with holistic scores and language error annotation, collected from open (spontaneous) speaking tests on the Speak & Improve learning platform. The aim of the corpus release is to address a major challenge to developing L2 spoken language processing systems, the lack of publicly available data with high-quality...
-
This 3-year project investigates language change in five urban dialects of Northern England—Derby, Newcastle, York, Leeds and Manchester. Data collection method: Linguistic analysis of speech data (conversational, word list) from samples of different northern English urban communities. Data collection consisted of interviews, which included (1) some structured questions about the interviewee...
-
Dynamic Dialects contains an articulatory video-based corpus of speech samples from world-wide accents of English. Videos in this corpus contain synchronised audio, ultrasound-tongue-imaging video and video of the moving lips. We are continuing to augment the database. The website contains three main resources: - A clickable Accent Map: clicking on points of the map will open up links to...
-
CREMA-D is a data set of 7,442 original clips from 91 actors. These clips were from 48 male and 43 female actors between the ages of 20 and 74 coming from a variety of races and ethnicities (African America, Asian, Caucasian, Hispanic, and Unspecified). Actors spoke from a selection of 12 sentences. The sentences were presented using one of six different emotions (Anger, Disgust, Fear, Happy,...
-
The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English, both spoken and written, from the late twentieth century. Access the data here: https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/2554
-
The Voices Obscured in Complex Environmental Settings (VOiCES) corpus is a creative commons speech dataset targeting acoustically challenging and reverberant environments with robust labels and truth data for transcription, denoising, and speaker identification. This is one of the largest corpora to date that has transcriptions and simulatenously recorded real-world noise. The details: -...
-
Expressive Anechoic Recordings of Speech (EARS). Highlights: - 100 h of speech data from 107 speakers - high-quality recordings at 48 kHz in an anechoic chamber - high speaker diversity with speakers from different ethnicities and age range from 18 to 75 years - full dynamic range of human speech, ranging from whispering to yelling - 18 minutes of freeform monologues per speaker - sentence...
Explore
Audio
- Multi-Speaker
-
Accent/Region
(3)
- British English (3)
- World Englishes (1)
- Conversation (1)
- Emotional Speech (2)
-
Language
(5)
- English (5)
- L2+ (1)
- Language Learning (1)
- Multi-Style (1)
- Speech in Noise (1)
Speech Production & Articulation
- Ultrasound (1)
Teaching Resources
Tags
- read speech
- audio data (5)
- English (4)
- adult (4)
- spontaneous speech (3)
- female (3)
- male (3)
- emotional speech (2)
- British (2)
- anechoic (1)
- fast speech (1)
- high pitch (1)
- loud speech (1)
- low pitch (1)
- shout (1)
- slow speech (1)
- whisper (1)
- environmental noise (1)
- noisy audio (1)
- reverberation (1)
- transcribed (1)
- angry (1)
- audiovisual (1)
- disgust (1)
- happy (1)
- older adult (1)
- sad (1)
- video (1)
- accent map (1)
- lip video (1)
- teaching resource (1)
- ultrasound tongue imaging (UTI) (1)
- Derby (1)
- English accents (1)
- Leeds (1)
- Manchester (1)
- Newcastle (1)
- York (1)
- conversation (1)
- L2 English (1)
- L2 speech (1)
- annotated (1)
- interview (1)
- language learning (1)