Voices Obscured in Complex Environmental Settings (VOiCES)

Resource type

Authors/contributors

Richey, Colleen (Author)
Barrios, Maria A. (Author)
Armstrong, Zeb (Author)
Bartels, Chris (Author)
Franco, Horacio (Author)
Graciarena, Martin (Author)
Lawson, Aaron (Author)
Nandwana, Mahesh Kumar (Author)
Stauffer, Allen (Author)
Hout, Julien van (Author)
Gamble, Paul (Author)
Hetherly, Jeff (Author)
Stephenson, Cory (Author)
Ni, Karl (Author)

Title

Abstract

The Voices Obscured in Complex Environmental Settings (VOiCES) corpus is a creative commons speech dataset targeting acoustically challenging and reverberant environments with robust labels and truth data for transcription, denoising, and speaker identification. This is one of the largest corpora to date that has transcriptions and simulatenously recorded real-world noise. The details: - Source Material: a total of 15 hours (3,903 audio files) - Language audio contains English read speech with male and females - Simulated Head Movement the loudspeaker playing the foreground speech was on a motorized rotating platform - Distractor Noise a large collection containing television, music, babble noise, and HVAC at various SNR - Multiple Rooms large, medium, and small, with various reverberation

Date

2018

Citation Key

richey.etal_2018

URL

https://iqtlabs.github.io/voices/

Accessed

22/11/2024, 14:04

Citation

Richey, C., Barrios, M. A., Armstrong, Z., Bartels, C., Franco, H., Graciarena, M., Lawson, A., Nandwana, M. K., Stauffer, A., Hout, J. van, Gamble, P., Hetherly, J., Stephenson, C., & Ni, K. (2018). Voices Obscured in Complex Environmental Settings (VOiCES). https://iqtlabs.github.io/voices/

Audio Data