EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

Resource type
Authors/contributors
Title
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
Abstract
We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totalling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics. In addition, we conduct a listening test with 20 participants for the speech enhancement task, where a generative method is preferred. We introduce a blind test set that allows for automatic online evaluation of uploaded data. Dataset download links and automatic evaluation server can be found online.
Date
2024
Conference Name
Proc. Interspeech 2024
Pages
4873-4877
Short Title
EARS
Accessed
15/11/2024, 14:07
Library Catalog
Citation
Richter, J., Wu, Y.-C., Krenn, S., Welker, S., Lay, B., Watanabe, S., Richard, A., & Gerkmann, T. (2024). EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation. 4873–4877. https://doi.org/10.21437/Interspeech.2024-153