Results | YorVoice Catalogue

Real-time speech MRI datasets with corresponding articulator ground-truth segmentations

Matthieu Ruthven, Agnieszka M. Peplinski, David M. Adams + 2 others

Abstract The use of real-time magnetic resonance imaging (rt-MRI) of speech is increasing in clinical practice and speech science research. Analysis of such images often requires segmentation of articulators and the vocal tract, and the community is turning to deep-learning-based methods to perform this segmentation. While there are publicly available rt-MRI datasets of speech,...

View on www.nature.com

Forensic Voice Comparison Databases: forensic_eval_01

Geoffrey Stewart Morrison, Ewald Enzinger

Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case. There is increasing pressure on forensic laboratories to validate the performance of forensic analysis systems before they are used to assess strength of evidence for presentation in court (including pressure from the recently released report by the President’s Council...

View on forensic-voice-comparison.net

Forensic Voice Comparison Databases: AusEng 500+

G. S. Morrison, C. Zhang, E. Enzinger + 8 others

Forensic database of voice recordings of 500+ Australian English speakers (AusEng 500+). This database contains 3899 recordings totalling 310 hours of speech from 555 Australian-English speakers. 324 female speakers: - 91 recorded in one recording session - 69 recorded in two separate recording sessions - 159 recorded in three recording sessions - 5 recorded in more than three recording...

View on forensic-voice-comparison.net

Multimodal Signal Processing (MSP) Conversation corpus

Luz Martinez-Lucas, Mohammed Abdelwahab, Carlos Busso

The MSP-Conversation corpus contains interactions annotated with time-continuous emotional traces for arousal (calm to active), valence (negative to positive), and dominance (weak to strong). Time-continuous annotations offer the flexibility to explore emotional displays at different temporal resolutions while leveraging contextual information. Release 1.0 contains 74 conversations with...

View on ecs.utdallas.edu

Seeing Speech

E. Lawson, J. Stuart-Smith, J. M. Scobbie + 1 others

Welcome to our interactive International Phonetic Association (IPA) chart website! Clicking on the IPA symbols on our charts will allow you to listen to their sounds and see vocal-organ movements imaged with ultrasound, MRI, or in animated form. To find out more about how our IPA charts were made, click on the buttons on the left-hand side of this page. The website contains two main...

View on seeingspeech.ac.uk

mngu0

Korin Richmond

This is a corpus of articulatory data of different forms (EMA, MRI, video, 3D scans of upper/lower jaw, audio etc.) acquired from one male British English speaker.

View on www.mngu0.org

USC-EMO-MRI: An emotional speech production database

Jangwon Kim, Asterios Toutios, Yoon-Chul Kim + 3 others

USC-EMO-MRI is an emotional speech production database which includes real-time magnetic resonance imaging data with synchronized speech audio from five male and five female actors, each producing a passage and a set of sentences in multiple repetitions, while enacting four different target emotions (neutral, happy, angry, sad). The database includes emotion quality evaluation from at least...

View on sail.usc.edu

USC-TIMIT

Shrikanth Narayanan, Asterios Toutios, Vikram Ramanarayanan + 12 others

USC-TIMIT is a database of speech production data under ongoing development, which currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English, and electromagnetic articulography data from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus. In...

View on sail.usc.edu

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images

Yongwan Lim, Asterios Toutios, Yannick Bliesener + 16 others

Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving...

View on figshare.com

CHILDES Spanish-English Hoff Corpus

Erika Hoff

These transcripts and video files are samples of Spanish and English caregiver (almost always mother)-child interaction collected at child ages 2 ½, 3, and 3 ½ years as part of a 10-year longitudinal study of the language and literacy development of U.S.-born children raised in Spanish-speaking homes. Each recording is approximately 30 minutes in length. The caregiver and target child are...

View on childes.talkbank.org

Multimodal Signal Processing (MSP) Podcast corpus

The MSP-Podcast corpus contains speech segments from podcast recordings which are perceptually annotated using crowdsourcing. The collection of this corpus is an ongoing process. Version 1.11 of the corpus has 151,654 speaking turns (237 hours and 56 mins). The proposed partition attempts to create speaker-independent datasets for Train, Development, Test1, Test2, and Test3 sets.

View on ecs.utdallas.edu

Emotional-Speech-Data

HLTSingapore

This dataset contains 350 parallel utterances spoken by 10 native Mandarin speakers, and 10 English speakers with 5 emotional states (neutral, happy, angry, sad and surprise). The transcripts are provided.

View on github.com

British National Corpus

The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English, both spoken and written, from the late twentieth century. Access the data here: https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/2554

View on www.natcorp.ox.ac.uk

Voices Obscured in Complex Environmental Settings (VOiCES)

Colleen Richey, Maria A. Barrios, Zeb Armstrong + 11 others

The Voices Obscured in Complex Environmental Settings (VOiCES) corpus is a creative commons speech dataset targeting acoustically challenging and reverberant environments with robust labels and truth data for transcription, denoising, and speaker identification. This is one of the largest corpora to date that has transcriptions and simulatenously recorded real-world noise. The details: -...

View on iqtlabs.github.io

CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92)

Junichi Yamagishi, Christophe Veaux, Kirsten MacDonald

This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive.

View on datashare.ed.ac.uk

Your search

Results 30 resources

Explore

Audio Data

Derived & Measured Data

Speech Perception Data

Speech Production Data

Teaching Resources

Software, Processing & Utilities

Tags

Resource type