Your search

  • openSMILE (open-source Speech and Music Interpretation by Large-space Extraction) is a complete and open-source toolkit for audio analysis, processing and classification especially targeted at speech and music applications, e.g. automatic speech recognition, speaker identification, emotion recognition, or beat tracking and chord detection.

  • Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

  • WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

  • Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

Last update from database: 28/05/2025, 04:10 (UTC)