Search

Full catalogue 155 resources

Page 3 of 11

Abstracts

Large Sundanese ASR training data set

Sundanese ASR training data set containing ~220K utterances. This data set contains transcribed audio data for Sundanese. The data set consists of wave files, and a TSV file. The file utt_spk_text.tsv contains a FileID, UserID and the transcription of audio in the file. The data set has been manually quality checked, but there might still be errors. This dataset was collected by Google in Indonesia.

View on openslr.org
Zeroth-Korean

Korean Open-source Speech Corpus for Speech Recognition by Zeroth Project. The data set contains transcriebed audio data for Korean. There are 51.6 hours transcribed Korean audio for training data (22,263 utterances, 105 people, 3000 sentences) and 1.2 hours transcribed Korean audio for testing data (457 utterances, 10 people). This corpus also contains pre-trained/designed language model,...

View on openslr.org
ALFFA (African Languages in the Field: speech Fundamentals and Automation)

This data is transcribed speech data, in Amharic and Swahili and Wolof.

View on openslr.org
Aishell

Aishell is an open-source Chinese Mandarin speech corpus published by Beijing Shell Shell Technology Co.,Ltd. 400 people from different accent areas in China are invited to participate in the recording, which is conducted in a quiet indoor environment using high fidelity microphone and downsampled to 16kHz. The manual transcription accuracy is above 95%, through professional speech annotation...

View on openslr.org
A DataSet of word sequences through MRI

Haroldo Gomes

A composite dataset with eight videos (totaling the pronunciation of seventeen words, with intervals, sagittal plane, and gray scale), for experiments in computer vision, video processing, and articulation investigation of the vocal tract.

View on ieee-dataport.org
African Accented French

African Accented French Corpus This corpus consists of approximately 22 hours of speech recordings. Transcripts are provided for all the recordings. The corpus can be divided into 3 parts: 1. Yaounde Collected by a team from the U.S. Military Academy's Center for Technology Enhanced Language Learning (CTELL) in 2003 in Yaoundé, Cameroon. It has recordings from 84 speakers, 48 male and 36...

View on openslr.org
PhonemeDF: A Synthetic Speech Dataset for Audio Deepfake Detection and Naturalness Evaluation

Vamshi Nallaguntla, Aishwarya Fursule, Shruti Kshirsagar + 1 others

PhonemeDF is a large-scale phoneme-level parallel dataset of real and synthetic speech (approximately 730 hours), designed for audio deepfake detection and speech naturalness evaluation. The dataset consists of real speech samples derived from a subset of the LibriSpeech corpus (train-clean-100) and corresponding synthetic speech generated using four Text-to-Speech (TTS) systems (MeloTTS,...

View on zenodo.org
ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech

Xin Wang, Héctor Delgado, Hemlata Tak + 26 others

This is the Zenodo repository for the ASVspoof 5 database. ASVspoof 5 is the fifth edition in a series of challenges which promote the study of speech spoofing and deepfake attacks, and the design of detection solutions. Compared to previous challenges, the ASVspoof~5 database is built from crowdsourced data collected from around 2,000 speakers in diverse acoustic conditions. More than 20...

View on zenodo.org
PartialSpoof Database - Partially Spoofed Audio Dataset for Anti-spoofing

Lin Zhang, Xin Wang, Erica Cooper + 3 others

All existing databases of spoofed speech contain attack data that is spoofed in its entirety. In practice, it is entirely plausible that successful attacks can be mounted with utterances that are only partially spoofed. By definition, partially-spoofed utterances contain a mix of both spoofed and bona fide segments, which will likely degrade the performance of countermeasures trained with...

View on zenodo.org
A multimodal speech-production dataset with time-aligned articulography, EEG, audio, and vocal-tract anatomy

Daniel Friedrichs, Valeriia Vyshnevetska, Monica Patricia Lancheros Pompeyo + 3 others

View on www.swissubase.ch
rtMRIDB Speech Organ Contour Data Ver. 0.9

Kikuo Maekawa, Hironori Takemoto

We are releasing the rtMRIDB Speech Organ Contour Data Ver. 0.9 (abb. rtMRI_cont). This dataset provides numerical data extracted from each frame of the real-time MRI videos published in the Realtime MRI Articulatory Movement Database, Ver. 2 (rtMRIDB_v2) [1], containing contour information of speech organs. Since this dataset may be updated in the near future, it is being released as a...

View on rtmridb.ninjal.ac.jp
rtMRIDB (The real-time MRI articulatory movement database)

Kikuo Maekawa

This is a database of moving images of the midsagittal section of the vocal tract during the production of Japanese utterances, recorded at a rate of 14 or 27 frames per second by using a medical MRI system with special operating settings. This data has realized the dream of articulatory phoneticians to visualize articulatory movements and may be widely used for critical review of the existing...

View on rtmridb.ninjal.ac.jp
The Edinburgh International Accents of English Corpus

Ramon Sanabria, Nina Markl, Andrea Carmantini + 4 others

English is the most widely spoken language in the world, used daily by millions of people as a first or second language in many different contexts. As a result, there are many varieties of English. Although the great many advances in English automatic speech recognition (ASR) over the past decades, results are usually reported based on test datasets which fail to represent the diversity of...

View on datashare.ed.ac.uk
The Sociolinguistic Archive and Analysis Project (SLAAP)

Tyler Kendall

The Sociolinguistic Archive and Analysis Project, at North Carolina State University, is an interactive web-based archive of sociolinguistic recordings, with integrated media playing and annotation features, as well as phonetic analysis and corpus analysis tools designed for enabling and improving empirical linguistic inquiry. The archive continues to grow over time. It currently contains (as...

View on slaap.chass.ncsu.edu
Synthetic vowels generated with 1D and 3D acoustic models

Rémi Blandin Blandin, Simon Stone Stone, Angélique Remacle Remacle + 2 others

This dataset contains the synthetic stimuli used in the study published in the paper "A Comparative Study of 3D and 1D Acoustic Simulations of the Higher Frequencies of Speech". The goal of this study was to evaluate the accuracy of the acoustic wave propagation in the vocal tract in a source-filter synthesis paradigm with two perceptual experiments. The high frequencies (above 4 kHz) of the...

View on ieee-dataport.org

Page 3 of 11

Main feed

Last update from database: 26/05/2026, 04:10 (UTC)

Search

Full catalogue 155 resources

Explore

Audio Data

Derived & Measured Data

Software, Processing & Utilities

Speech Perception Data

Speech Production Data

Teaching Resources

Tags

Resource type