Full catalogue
FoR: Fake or Real
Resource type
Authors/contributors
- Reimao, Ricardo (Author)
- Tzerpos, Vassilios (Author)
Title
FoR: Fake or Real
Abstract
The Fake-or-Real (FoR) dataset is a collection of more than 195,000 utterances from real humans and computer generated speech. The dataset can be used to train classifiers to detect synthetic speech.
The dataset aggregates data from the latest TTS solutions (such as Deep Voice 3 and Google Wavenet TTS) as well as a variety of real human speech, including the Arctic Dataset (http://festvox.org/cmu_arctic/), LJSpeech Dataset (https://keithito.com/LJ-Speech-Dataset/), VoxForge Dataset (http://www.voxforge.org) and our own speech recordings.
The dataset is published in four versions: for-original, for-norm, for-2sec and for-rerec.
The first version, named for-original, contains the files as collected from the speech sources, without any modification (balanced version).
The second version, called for-norm, contains the same files, but balanced in terms of gender and class and normalized in terms of sample rate, volume and number of channels.
The third one, named for-2sec is based on the second one, but with the files truncated at 2 seconds.
The last version, named for-rerec, is a rerecorded version of the for-2second dataset, to simulate a scenario where an attacker sends an utterance through a voice channel (i.e. a phone call or a voice message).
Citation Key
_bj
Citation
Reimao, R., & Tzerpos, V. (n.d.). FoR: Fake or Real [Dataset]. Retrieved https://bil.eecs.yorku.ca/datasets/
Audio Data
Link to this record