Search

Full catalogue 115 resources

Page 7 of 8

Abstracts

CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92)

Junichi Yamagishi, Christophe Veaux, Kirsten MacDonald

This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive.

View on datashare.ed.ac.uk
VoxForge

VoxForge is an open speech dataset that was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac).

View on www.voxforge.org
FoCal Toolkit

Niko Brummer

Toolkit for Evaluation, Fusion and Calibration of statistical pattern recognizers At present the FoCal toolkit has two branches: The original FoCal is applicable to any two-class recognizer and has been specialized for the task of speaker detection, as found in the NIST Speaker Recognition

View on sites.google.com
Mozilla Common Voice

Common Voice is a project to help make voice recognition open to everyone. Developers need an enormous amount of voice data to build voice recognition technologies, and currently most of that data is expensive and proprietary. We want to make voice data freely and publicly available, and make sure the data represents the diversity of real people. Together we can make voice recognition better for everyone.

View on commonvoice.mozilla.org
ASVspoof

The automatic speaker verification spoofing and countermeasures (ASVspoof) challenge series is a community-led initiative which aims to promote the consideration of spoofing and deepfakes and the development of countermeasures.

View on www.asvspoof.org
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

Julius Richter, Yi-Chiao Wu, Steven Krenn + 5 others

We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totalling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark...

View on www.isca-archive.org
EARS Dataset

Expressive Anechoic Recordings of Speech (EARS). Highlights: - 100 h of speech data from 107 speakers - high-quality recordings at 48 kHz in an anechoic chamber - high speaker diversity with speakers from different ethnicities and age range from 18 to 75 years - full dynamic range of human speech, ranging from whispering to yelling - 18 minutes of freeform monologues per speaker - sentence...

View on github.com
Pyannote

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

View on github.com
Silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

View on github.com
SpeechBrain

Mirco Ravanelli, Titouan Parcollet, Peter Plantinga + 18 others

SpeechBrain is an open-source PyTorch toolkit that accelerates Conversational AI development, i.e., the technology behind speech assistants, chatbots, and large language models. It is crafted for fast and easy creation of advanced technologies for Speech and Text Processing.

View on github.com
Opensmile

openSMILE (open-source Speech and Music Interpretation by Large-space Extraction) is a complete and open-source toolkit for audio analysis, processing and classification especially targeted at speech and music applications, e.g. automatic speech recognition, speaker identification, emotion recognition, or beat tracking and chord detection.

View on github.com
Whisper

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

View on github.com
WhisperX

Max Bain

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

View on github.com
CrisperWhisper

Laurin Wagner,

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

View on github.com
VoiceSauce

Y.-L. Shue

VoiceSauce is an application, implemented in Matlab, which provides automated voice measurements over time from audio recordings. Inputs are standard wave (*.wav) files and the measures currently computed are: F0 Formants F1-F4 H1(*) H2(*) H4(*) A1(*) A2(*) A3(*) 2K(*) 5K H1(*)-H2(*) H2(*)-H4(*) H1(*)-A1(*) H1(*)-A2(*) H1(*)-A3(*) H4(*)-2K(*) 2K(*)-5K Energy Cepstral Peak Prominence Harmonic...

View on phonetics.ucla.edu

Page 7 of 8

Main feed

Last update from database: 19/02/2026, 04:10 (UTC)

Search

Full catalogue 115 resources

Explore

Audio Data

Derived & Measured Data

Software, Processing & Utilities

Speech Production Data

Teaching Resources

Tags

Resource type