Search

Full catalogue 155 resources

Page 9 of 11

Abstracts

Speech Rate: Praat script that detects syllable nuclei

Nivja de Jong, Ton Wempe

Praat script that automatically detects syllable nuclei in order to measure speech rate without the need of a transcription. Peaks in intensity (dB) that are preceded and followed by dips in intensity are considered as potential syllable nuclei. The script subsequently discards peaks that are not voiced.

View on sites.google.com
SpeechBox: digital speech corpora

Anne R Bradlow

SpeechBox is a set of multiple speech resources. Each will be added to the YorVoice Data Catalogue as individual, searchable resources soon.

View on speechbox.linguistics.northwestern.edu
CHILDES Spanish-English Hoff Corpus

Erika Hoff

These transcripts and video files are samples of Spanish and English caregiver (almost always mother)-child interaction collected at child ages 2 ½, 3, and 3 ½ years as part of a 10-year longitudinal study of the language and literacy development of U.S.-born children raised in Spanish-speaking homes. Each recording is approximately 30 minutes in length. The caregiver and target child are...

View on childes.talkbank.org
Multimodal Signal Processing (MSP) Podcast corpus

The MSP-Podcast corpus contains speech segments from podcast recordings which are perceptually annotated using crowdsourcing. The collection of this corpus is an ongoing process. Version 1.11 of the corpus has 151,654 speaking turns (237 hours and 56 mins). The proposed partition attempts to create speaker-independent datasets for Train, Development, Test1, Test2, and Test3 sets.

View on ecs.utdallas.edu
Emotional-Speech-Data

HLTSingapore

This dataset contains 350 parallel utterances spoken by 10 native Mandarin speakers, and 10 English speakers with 5 emotional states (neutral, happy, angry, sad and surprise). The transcripts are provided.

View on github.com
Crowd Sourced Emotional Multimodal Actors Dataset (CREMA-D)

Houwei Cao, David G Cooper, Michael K Keutmann + 3 others

CREMA-D is a data set of 7,442 original clips from 91 actors. These clips were from 48 male and 43 female actors between the ages of 20 and 74 coming from a variety of races and ethnicities (African America, Asian, Caucasian, Hispanic, and Unspecified). Actors spoke from a selection of 12 sentences. The sentences were presented using one of six different emotions (Anger, Disgust, Fear, Happy,...

View on github.com
British National Corpus

The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English, both spoken and written, from the late twentieth century. Access the data here: https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/2554

View on www.natcorp.ox.ac.uk
Voices Obscured in Complex Environmental Settings (VOiCES)

Colleen Richey, Maria A. Barrios, Zeb Armstrong + 11 others

The Voices Obscured in Complex Environmental Settings (VOiCES) corpus is a creative commons speech dataset targeting acoustically challenging and reverberant environments with robust labels and truth data for transcription, denoising, and speaker identification. This is one of the largest corpora to date that has transcriptions and simulatenously recorded real-world noise. The details: -...

View on iqtlabs.github.io
AudioSet

A sound vocabulary and dataset AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. The ontology is specified as a hierarchical graph of event categories, covering a wide range of human and animal sounds, musical instruments and genres, and common everyday environmental sounds. By...

View on research.google.com
openslr.org

Open SLR is a set of multiple speech resources. Each will be added to the YorVoice Data Catalogue as individual, searchable resources soon.

View on www.openslr.org
CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92)

Junichi Yamagishi, Christophe Veaux, Kirsten MacDonald

This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive.

View on datashare.ed.ac.uk
VoxForge

VoxForge is an open speech dataset that was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac).

View on www.voxforge.org
FoCal Toolkit

Niko Brummer

Toolkit for Evaluation, Fusion and Calibration of statistical pattern recognizers At present the FoCal toolkit has two branches: The original FoCal is applicable to any two-class recognizer and has been specialized for the task of speaker detection, as found in the NIST Speaker Recognition

View on sites.google.com
Mozilla Common Voice

Common Voice is a project to help make voice recognition open to everyone. Developers need an enormous amount of voice data to build voice recognition technologies, and currently most of that data is expensive and proprietary. We want to make voice data freely and publicly available, and make sure the data represents the diversity of real people. Together we can make voice recognition better for everyone.

View on commonvoice.mozilla.org
ASVspoof

The automatic speaker verification spoofing and countermeasures (ASVspoof) challenge series is a community-led initiative which aims to promote the consideration of spoofing and deepfakes and the development of countermeasures.

View on www.asvspoof.org

Page 9 of 11

Main feed

Last update from database: 05/07/2026, 04:10 (UTC)

Search

Full catalogue 155 resources

Explore

Audio Data

Derived & Measured Data

Speech Perception Data

Speech Production Data

Teaching Resources

Software, Processing & Utilities

Tags

Resource type