Search

Full catalogue 113 resources

Page 5 of 8

Abstracts

An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images

Michel Belyk, Christopher Carignan, Carolyn McGettigan

Abstract Real-time magnetic resonance imaging (rtMRI) is a technique that provides high-contrast videographic data of human anatomy in motion. Applied to the vocal tract, it is a powerful method for capturing the dynamics of speech and other vocal behaviours by imaging structures internal to the mouth and throat. These images provide a means of studying the physiological basis for...

View on link.springer.com
Forensic Voice Comparison Databases: Standard Chinese 68 female speakers

C. Zhang, Geoffrey Stewart Morrison

This database contains two non-contemporaneous recordings of each of 68 female speakers of Standard Chinese (a.k.a. Mandarin and Putonghua). 60 of the speakers are from north eastern China, and 8 are from southern China. Each speaker was recorded in three speaking styles: - casual telephone conversation (cnv) - information exchange task over the telephone (fax) - pseudo-police-style interview (int)

View on forensic-voice-comparison.net
Forensic Voice Comparison Databases: forensic_eval_01

Geoffrey Stewart Morrison, Ewald Enzinger

Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case. There is increasing pressure on forensic laboratories to validate the performance of forensic analysis systems before they are used to assess strength of evidence for presentation in court (including pressure from the recently released report by the President’s Council...

View on forensic-voice-comparison.net
Forensic Voice Comparison Databases: AusEng 500+

G. S. Morrison, C. Zhang, E. Enzinger + 8 others

Forensic database of voice recordings of 500+ Australian English speakers (AusEng 500+). This database contains 3899 recordings totalling 310 hours of speech from 555 Australian-English speakers. 324 female speakers: - 91 recorded in one recording session - 69 recorded in two separate recording sessions - 159 recorded in three recording sessions - 5 recorded in more than three recording...

View on forensic-voice-comparison.net
Multimodal Signal Processing (MSP) Conversation corpus

Luz Martinez-Lucas, Mohammed Abdelwahab, Carlos Busso

The MSP-Conversation corpus contains interactions annotated with time-continuous emotional traces for arousal (calm to active), valence (negative to positive), and dominance (weak to strong). Time-continuous annotations offer the flexibility to explore emotional displays at different temporal resolutions while leveraging contextual information. Release 1.0 contains 74 conversations with...

View on ecs.utdallas.edu
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

Max Bain, Jaesung Huh, Tengda Han + 1 others

Large-scale, weakly-supervised speech recognition models, such as Whisper, have demonstrated impressive results on speech recognition across domains and languages. However, their application to long audio transcription via buffered or sliding window approaches is prone to drifting, hallucination and repetition; and prohibits batched transcription due to their sequential nature. Further,...

View on arxiv.org
Robust Speech Recognition via Large-Scale Weak Supervision

Alec Radford, Jong Wook Kim, Tao Xu + 3 others

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero-shot transfer setting without the need for any...

View on arxiv.org
Emotional voice conversion: Theory, databases and ESD

Kun Zhou, Berrak Sisman, Rui Liu + 1 others

In this paper, we first provide a review of the state-of-the-art emotional voice conversion research, and the existing emotional speech databases. We then motivate the development of a novel emotional speech database (ESD) that addresses the increasing research need. With this paper, the ESD database1 is now made available to the research community. The ESD database consists of 350 parallel...

View on linkinghub.elsevier.com
Intonational variation in Arabic Corpus

Sam Hellmuth, Rana Almbark

Twenty five countries have Arabic as an official language, but the dialects spoken vary greatly, and even within one country different accents are heard. Many features create the impression of 'a different accent', including how particular sounds are pronounced, where stress falls in a word, and what intonation pattern is used. There is extensive prior research on the first two of these for...

View on reshare.ukdataservice.ac.uk
Speech Zone

Simon King

Hi – my name is Simon King and this is my personal website for supporting my teaching. I am the Professor of Speech Processing at the University of Edinburgh, where I teach courses in speech processing and speech synthesis at advanced undergraduate and Masters level. Use of this website Students: You may use this website freely for personal use. You may download copies of the content for your...

View on speech.zone
Seeing Speech

E. Lawson, J. Stuart-Smith, J. M. Scobbie + 1 others

Welcome to our interactive International Phonetic Association (IPA) chart website! Clicking on the IPA symbols on our charts will allow you to listen to their sounds and see vocal-organ movements imaged with ultrasound, MRI, or in animated form. To find out more about how our IPA charts were made, click on the buttons on the left-hand side of this page. The website contains two main...

View on seeingspeech.ac.uk
Dynamic Dialects

E. Lawson, J. Stuart-Smith, J. M. Scobbie + 1 others

Dynamic Dialects contains an articulatory video-based corpus of speech samples from world-wide accents of English. Videos in this corpus contain synchronised audio, ultrasound-tongue-imaging video and video of the moving lips. We are continuing to augment the database. The website contains three main resources: - A clickable Accent Map: clicking on points of the map will open up links to...

View on www.dynamicdialects.ac.uk
mngu0

Korin Richmond

This is a corpus of articulatory data of different forms (EMA, MRI, video, 3D scans of upper/lower jaw, audio etc.) acquired from one male British English speaker.

View on www.mngu0.org
USC Speech and Vocal Tract Morphology MRI Database

Tanner Sorensen, Zisis Skordilis, Asterios Toutios + 9 others

The USC Speech and Vocal Tract Morphology MRI Database consists of real-time magnetic resonance images of dynamic vocal tract shaping during read and spontaneous speech with concurrently recorded denoised audio, and 3D volumetric MRI of vocal tract shapes during vowels and continuant consonants sustained for 7 seconds, from 17 speakers.

View on sail.usc.edu
USC-EMO-MRI: An emotional speech production database

Jangwon Kim, Asterios Toutios, Yoon-Chul Kim + 3 others

USC-EMO-MRI is an emotional speech production database which includes real-time magnetic resonance imaging data with synchronized speech audio from five male and five female actors, each producing a passage and a set of sentences in multiple repetitions, while enacting four different target emotions (neutral, happy, angry, sad). The database includes emotion quality evaluation from at least...

View on sail.usc.edu

Page 5 of 8

Main feed

Last update from database: 16/07/2025, 04:10 (UTC)

Search

Full catalogue 113 resources

Explore

Audio

Benchmarks & Validation

Derived & Measured Data

Software, Processing & Utilities

Speech Production & Articulation

Teaching Resources

Vocal Anatomy

Tags

Resource type