Results | YorVoice Catalogue

VOICEBOX: Speech Processing Toolbox for MATLAB

VOICEBOX is a speech processing toolbox consists of MATLAB routines that are maintained by and mostly written by Mike Brookes, Department of Electrical & Electronic Engineering, Imperial College, Exhibition Road, London SW7 2BT, UK. The routines are available as a GitHub repository (or a zip archive but often slightly out-of-date) and are made available under the terms of the GNU Public...

View on www.ee.ic.ac.uk

Semi-automatic mandible segmentation (SAMS) pipeline

View on samsdoc.readthedocs.io

Formant Tracking in Children’s Speech

Brad Story, Kate Bunton

Children’s speech presents a challenging problem for formant frequency measurement. In part, this is because high fundamental frequencies, typical of a children’s speech production, generate widely spaced harmonic components that may undersample the spectral shape of the vocal tract transfer function. In addition, there is often a weakening of upper harmonic energy and a noise component due to...

View on sites.arizona.edu

An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images

Michel Belyk

Real-time magnetic resonance imaging (rtMRI) is a technique that provides high-contrast videographic data of human anatomy in motion. Applied to the vocal tract, it is a powerful method for capturing the dynamics of speech and other vocal behaviours by imaging structures internal to the mouth and throat. These images provide a means of studying the physiological basis for speech, singing,...

View on osf.io

Emotional-Speech-Data

HLTSingapore

This dataset contains 350 parallel utterances spoken by 10 native Mandarin speakers, and 10 English speakers with 5 emotional states (neutral, happy, angry, sad and surprise). The transcripts are provided.

View on github.com

FoCal Toolkit

Niko Brummer

Toolkit for Evaluation, Fusion and Calibration of statistical pattern recognizers At present the FoCal toolkit has two branches: The original FoCal is applicable to any two-class recognizer and has been specialized for the task of speaker detection, as found in the NIST Speaker Recognition

View on sites.google.com

Pyannote

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

View on github.com

Silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

View on github.com

SpeechBrain

Mirco Ravanelli, Titouan Parcollet, Peter Plantinga + 18 others

SpeechBrain is an open-source PyTorch toolkit that accelerates Conversational AI development, i.e., the technology behind speech assistants, chatbots, and large language models. It is crafted for fast and easy creation of advanced technologies for Speech and Text Processing.

View on github.com

Opensmile

openSMILE (open-source Speech and Music Interpretation by Large-space Extraction) is a complete and open-source toolkit for audio analysis, processing and classification especially targeted at speech and music applications, e.g. automatic speech recognition, speaker identification, emotion recognition, or beat tracking and chord detection.

View on github.com

Whisper

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

View on github.com

WhisperX

Max Bain

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

View on github.com

CrisperWhisper

Laurin Wagner,

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

View on github.com

VoiceSauce

Y.-L. Shue

VoiceSauce is an application, implemented in Matlab, which provides automated voice measurements over time from audio recordings. Inputs are standard wave (*.wav) files and the measures currently computed are: F0 Formants F1-F4 H1(*) H2(*) H4(*) A1(*) A2(*) A3(*) 2K(*) 5K H1(*)-H2(*) H2(*)-H4(*) H1(*)-A1(*) H1(*)-A2(*) H1(*)-A3(*) H4(*)-2K(*) 2K(*)-5K Energy Cepstral Peak Prominence Harmonic...

View on phonetics.ucla.edu

SoX - Sound eXchange

SoX is the Swiss Army Knife of sound processing utilities. It can convert audio files to other popular audio file types and also apply sound effects and filters during the conversion.

View on sourceforge.net

Your search

Results 19 resources

Explore

Audio Data

Speech Production Data

Software, Processing & Utilities

Tags

Resource type