Your search
Results 19 resources
-
VOICEBOX is a speech processing toolbox consists of MATLAB routines that are maintained by and mostly written by Mike Brookes, Department of Electrical & Electronic Engineering, Imperial College, Exhibition Road, London SW7 2BT, UK. The routines are available as a GitHub repository (or a zip archive but often slightly out-of-date) and are made available under the terms of the GNU Public...
-
Children’s speech presents a challenging problem for formant frequency measurement. In part, this is because high fundamental frequencies, typical of a children’s speech production, generate widely spaced harmonic components that may undersample the spectral shape of the vocal tract transfer function. In addition, there is often a weakening of upper harmonic energy and a noise component due to...
-
Real-time magnetic resonance imaging (rtMRI) is a technique that provides high-contrast videographic data of human anatomy in motion. Applied to the vocal tract, it is a powerful method for capturing the dynamics of speech and other vocal behaviours by imaging structures internal to the mouth and throat. These images provide a means of studying the physiological basis for speech, singing,...
-
This dataset contains 350 parallel utterances spoken by 10 native Mandarin speakers, and 10 English speakers with 5 emotional states (neutral, happy, angry, sad and surprise). The transcripts are provided.
-
Toolkit for Evaluation, Fusion and Calibration of statistical pattern recognizers At present the FoCal toolkit has two branches: The original FoCal is applicable to any two-class recognizer and has been specialized for the task of speaker detection, as found in the NIST Speaker Recognition
-
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
-
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
-
SpeechBrain is an open-source PyTorch toolkit that accelerates Conversational AI development, i.e., the technology behind speech assistants, chatbots, and large language models. It is crafted for fast and easy creation of advanced technologies for Speech and Text Processing.
-
openSMILE (open-source Speech and Music Interpretation by Large-space Extraction) is a complete and open-source toolkit for audio analysis, processing and classification especially targeted at speech and music applications, e.g. automatic speech recognition, speaker identification, emotion recognition, or beat tracking and chord detection.
-
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.
-
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
-
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
-
VoiceSauce is an application, implemented in Matlab, which provides automated voice measurements over time from audio recordings. Inputs are standard wave (*.wav) files and the measures currently computed are: F0 Formants F1-F4 H1(*) H2(*) H4(*) A1(*) A2(*) A3(*) 2K(*) 5K H1(*)-H2(*) H2(*)-H4(*) H1(*)-A1(*) H1(*)-A2(*) H1(*)-A3(*) H4(*)-2K(*) 2K(*)-5K Energy Cepstral Peak Prominence Harmonic...
-
SoX is the Swiss Army Knife of sound processing utilities. It can convert audio files to other popular audio file types and also apply sound effects and filters during the conversion.
Explore
Audio
- Child Speech (1)
- Emotional Speech (1)
- Language (1)
- Multi-Speaker (1)
Software, Processing & Utilities
Vocal Anatomy
- Mandible (1)
Tags
- speech processing (6)
- MATLAB (5)
- automatic speech recognition (ASR) (4)
- speaker diarization (3)
- audio processing (3)
- speech recognition (3)
- transcription (3)
- open-source (3)
- formant measurement (2)
- audio (2)
- convert (2)
- file format (2)
- feature extraction (2)
- speech to text (2)
- Python (2)
- speech activity detection (2)
- voice activity detection (2)
- segmentation (2)
- Praat (1)
- TextGrid (1)
- phonetics (1)
- software (1)
- spectrogram (1)
- speech analysis (1)
- child-centered audio (1)
- language development (1)
- language environment analysis (LENA) (1)
- word count estimation (1)
- record (1)
- stream (1)
- cepstral peak prominence (CPP) (1)
- harmonic-to-noise ratio (HNR) (1)
- C++ (1)
- classification (1)
- emotion recognition (1)
- speaker identification (1)
- conversational AI (1)
- overlapped speech detection (1)
- speaker embedding (1)
- speaker detection (1)
- two-class recognizer (1)
- English (1)
- Mandarin (1)
- angry (1)
- emotional speech (1)
- happy (1)
- sad (1)
- surprise (1)
- transcribed (1)
- MRI (1)
- image processing (1)
- rtMRI (1)
- vocal tract shape (1)
- child speech (1)
- formant tracking (1)
- computed tomography (CT) (1)
- mandible (1)
- area function (1)
- numerical acoustic modelling (1)
- source-filter model (1)
- tube model (1)
- vocal fold model (1)