Full catalogue

Return to list of results

Page 46 of 113

Corpus Phonetics Tutorial

Resource type

Author/contributor

Chodroff, Eleanor (Author)

Title

Corpus Phonetics Tutorial

Abstract

Corpus phonetics has become an increasingly popular method of research in linguistic analysis. With advances in speech technology and computational power, large scale processing of speech data has become a viable technique. A fair number of researchers have exploited these methods, yet these techniques still remain elusive for many. In the words of Mark Liberman, there has been “surprisingly little change in style and scale of [phonetic] research” from 1966 on, implying that the field still relies on small sample sizes of speech data (2009). While “big data” phonetics is not the be-all and end-all of phonetic research, larger sample sizes ensure more statistically sound conclusions about phonetic values in an individual or population. Furthermore, corpus research is not synonymous with big data. Rather, corpus phonetics describes a method of processing speech data with advantages primarily gained in its computational power (relation to big data) and efficiency. The methods and tools developed for corpus phonetics are based on engineering algorithms primarily from automatic speech recognition (ASR), as well as simple programming for data manipulation. This tutorial aims to bring some of these tools to the non-engineer, and specifically to the speech scientist. Acoustic analysis programs such as Praat, MATLAB, and R (check out the tuneR and multitaper packages) are already capable of large scale phonetic measurement via their respective scripting languages. While the tutorial covers some phonetic processing in Praat, the primary aim is to introduce supplementary tools to phonetic processing. These tools are based on concepts and algorithms from automatic speech recognition, which allow for automatic alignment of phonetic boundaries to the speech signal. In particular, the tutorial currently covers various tools from the Kaldi Automatic Speech Recognition Toolki, the Montreal Forced Aligner (MFA v2), Praat scripting, AutoVOT, and bash shell usage. You can also find additional resources for Praat scripting, additional corpus phonetic tools, and legacy tutorial pages for MFA version 1, FAVE-align, and the Penn Phonetics Lab Forced Aligner in the section on Other Resources.

URL

https://www.eleanorchodroff.com/tutorial/index.html

Citation

Chodroff, E. (n.d.). Corpus Phonetics Tutorial. https://www.eleanorchodroff.com/tutorial/index.html

Teaching Resources

Tutorials

Link to this record

https://catalogue.yorvoice.york.ac.uk/catalogue/29K33J5B

Page 46 of 113