Your search
Results 8 resources
-
DECTE is an amalgamation of the existing Newcastle Electronic Corpus of Tyneside English (NECTE), created between 2001 and 2005, and NECTE2, a collection of interviews conducted in the Tyneside area since 2007. It thereby constitutes a rare example of a publicly available on-line corpus presenting dialect material spanning five decades.
-
The Buckeye Corpus of conversational speech contains high-quality recordings from 40 speakers in Columbus OH conversing freely with an interviewer. The speech has been orthographically transcribed and phonetically labeled. The audio and text files, together with time-aligned phonetic labels, are stored in a format for use with speech analysis software (Xwaves and Wavesurfer).
-
This 3-year project investigates language change in five urban dialects of Northern England—Derby, Newcastle, York, Leeds and Manchester. Data collection method: Linguistic analysis of speech data (conversational, word list) from samples of different northern English urban communities. Data collection consisted of interviews, which included (1) some structured questions about the interviewee...
-
The MSP-Conversation corpus contains interactions annotated with time-continuous emotional traces for arousal (calm to active), valence (negative to positive), and dominance (weak to strong). Time-continuous annotations offer the flexibility to explore emotional displays at different temporal resolutions while leveraging contextual information. Release 1.0 contains 74 conversations with...
-
Dynamic Dialects contains an articulatory video-based corpus of speech samples from world-wide accents of English. Videos in this corpus contain synchronised audio, ultrasound-tongue-imaging video and video of the moving lips. We are continuing to augment the database. The website contains three main resources: - A clickable Accent Map: clicking on points of the map will open up links to...
-
The MSP-Podcast corpus contains speech segments from podcast recordings which are perceptually annotated using crowdsourcing. The collection of this corpus is an ongoing process. Version 1.11 of the corpus has 151,654 speaking turns (237 hours and 56 mins). The proposed partition attempts to create speaker-independent datasets for Train, Development, Test1, Test2, and Test3 sets.
-
The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English, both spoken and written, from the late twentieth century. Access the data here: https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/2554
-
This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive.
Explore
Audio Data
- Language
-
Accent/Region
(6)
- American English (1)
- British English (5)
- World Englishes (1)
- Conversation (4)
- Emotional Speech (1)
Speech Production Data
- Ultrasound (1)
Teaching Resources
Tags
- adult
- English (7)
- audio data (7)
- read speech (4)
- conversation (3)
- English accents (2)
- British (2)
- perceptually annotated (2)
- Newcastle (2)
- female (2)
- male (2)
- phonetic labels (2)
- transcribed (2)
- rainbow passage (1)
- podcast (1)
- lip video (1)
- spontaneous speech (1)
- teaching resource (1)
- ultrasound tongue imaging (UTI) (1)
- Derby (1)
- Leeds (1)
- Manchester (1)
- York (1)
- American English (1)
- Ohio (1)
- British English (1)
Resource type
- Dataset (5)
- Journal Article (1)
- Web Page (2)