Analyze tone and plot: A Praat script for exploring, analyzing, and visualizing or plotting F0 (pitch) tracks
Along the lines of some of my previous posts on tools for linguistic analysis, I thought I'd post a quick update on a Praat script I've been working on. While not part of the series of posts I've been working on describing my tools and workflow, this particular script is a recent tool that I've developed for analyzing and exploring tonal patterns.
As I noted in my post on Praat scripting, the Praat program gives linguists a relatively easy way to investigate the properties of speech sounds, and scripts help to automate relatively mundane tasks. But regarding tone, while Praat can easily identify and measure tone, there are no comprehensive scripts that help a linguist to instrumentally investigate tones in a systematic way. Let me first explain a bit.
What is 'Tone'?
In this post, I use the word 'tone' to refer specifically to pitch that is used in a language at the word or morpheme level to indicate a difference in meaning from another word (or words) that are otherwise segmentally identical. In linguistics, this is a "suprasegmental" feature, which essentially means that it occurs in parallel with the phonetic segments of speech created by the movement of articulators (tongue, mouth, etc..) of the human vocal apparatus.
Tone has a lot of similarity to pitch as used in music, which is why tones often seem to 'disappear' or be relatively unimportant in songs sung in languages that otherwise have tone (like Mandarin). Another important piece of information to remember is that there are quite a few different kinds of tone languages. The major tone types are 'register' and 'contour' tones - often languages will be categorized as having one or the other, but some have both types. African languages tend to have register tone: this means that pitch height (measured in frequency, often by Hz) is the primary acoustic correlate of tone in these languages. Asian languages tend to have contour tone: this means that across the tone-bearing-unit (TBU) the pitch may rise or fall.
There are also languages for which the primary acoustic correlate of tone (or some tones) is not pitch, but rather voice quality: glottalization (i.e. Burmese), jitter, shimmer, creakiness, or length. Languages for which pitch is not the overall primary acoustic correlate of tone are often called 'register' languages. Register languages are found throughout South-East Asia.
The pitch correlates of tones are relatively easy for Praat to identify, as its instrumentally observable correspondence is the fundamental frequency (F0) of a sound. Praat can extract the fundamental frequency as well as other frequencies (formants) that are important for various vowels. Tone researchers often use a script to extract F0, and then apply various normalization techniques to the extracted data, plotting the results in a word-processing program like Excel or by using a more robust statistical tool like R (see this paper or this post for more details, and James Stanford has done some good work on this subject - see his presentation on Socio-tonetics here).
One issue with this approach, however, is that often pitch tracks for each tone are time-normalized along with every other tone. So for example, in image #1 below of Cantonese tones (from this paper), all pitches are plotted across the same duration. This is fine if pitch is the primary acoustic correlate of all tones, but what if duration/length is an important correlate as well? How does a researcher begin to discover whether that is the case?
Script for analyzing tone
The problem I tried to solve was basically this: I was investigating a language (Biate) for which no research had been done on tone, and I needed a way to quickly visualize the tonal (F0) properties of a large number of items. I also wanted to easily re-adjust the visualization parameters so that I could output a permanent image. Since I couldn't find an existing tool that could do this, and since I wanted it all to be done in a single program (rather than switching between two/several) I created the Praat script that you can find here at my GitHub page under the folder 'Analyze Tone'. If you look closely, you'll notice that this scripts incorporates several parts (or large chunks) of other people's work, and I'm indebted to their pioneering spirit.
There is a good bit of documentation at the page itself, so I'll leave you to get the script and explore, and will give only a brief explanation and example below. I have so far only tested this script on a Mac, so if you have a PC and would like to test it, please do so and send me feedback. Or, if you're a GitHub user, feel free to branch the repo and submit corrections.
The basic function of the script is to automate F0 extraction, normalization, and visualization using Praat in combination with annotated TextGrid/audio file pairs. Segments that are labeled the same in TextGrids belonging to the same audio file are treated the same. You can start out by plotting all tracks, in which case all tones labeled the same will be drawn with the same color. Or you can create normalized pitch traces, in which case a single F0 track will be plotted for all tones with the same label. This script creates audio files on your hard drive as one of its steps, so make sure you have enough free space (the size of each file is usually less than 50Kb, but if you have hundreds, they can add up).
In order to start, simply open an audio file in Praat and annotate it with the help of a TextGrid (if you don't know how, try following this video tutorial). Create a label tier that only contains tone category labels, and add labels to boundaries that surround each tone-bearing-unit. Then save the textgrid file in the same directory as the audio file, and run the script.
The kind of output you can get with this script is shown below. The first picture shows all the tones plotted for a single speaker, and the normalized F0 tracks for each tone are shown in the second picture. Importantly, F0 tracks are only length-normalized for each tone. You will also notice that the second image zooms in closer so that the relative differences between tones can be seen more easily.
Reasons for visualization
These kinds of visualizations are important for several reasons.
If you have labeled tones from multiple speakers, it is also relatively easy to see whether differences between tones are actually salient. In image #2 above, for example, we can see that most of the red tones (Tone 1) cluster with a higher (rising) contour, whereas the blue tones (Tone 2) cluster with a lower (falling) contour. A few blue tones have a rising F0 track, which may be outliers. In the lower image #3, we see that normalizing across 100 or so instances of each tone shows little effect from those rising blue tones, which strongly suggests they were outliers.
We can also see in this normalized plot that there is a clear difference between Tone 1 (red) and Tone 3 (blue), but the marginal Tone 3 (green) was questionable for most speakers that I recorded, and in the plot it is virtually identical to Tone 2. If this is true for the majority of speakers, we can conclude that Tone 3 probably doesn't actually exist in Biate as a tone correlated with pitch.
Of course all of these visualizations need qualification, and further investigation is needed by any researcher who uses this tool - your methods of data collection and analysis are crucial to ensuring that the information you are gaining from your investigation is leading to the correct analysis. But speakers of a tone language will often clearly identify categories and what you need is a tool to make these categories visible to those who may not perceive them.
I'm hoping in future to expand the coverage of this tool in order to assist with visualizing other acoustic correlates of tone, such as glottalization and creakiness. But I'm not sure when I'll get around to it. If you are a Praat script writer and would like to give it a shot, please join the collaboration and make adjustments - I'd be happy to incorporate them. For now, though, enjoy plotting pitch tracks, and let me know if you have any trouble!
In this third post about linguistic tools, I'll be discussing software that I use for acoustic analysis. Praat is one of the premier acoustic analysis tools available for computers. While there are probably commercial software products out there that are more powerful and with more bells and whistles, Praat offers some of the best ways to visualize and manipulate sound while being free and cross-platform. While it's not completely intuitive, it is quite easy to explore the sound space of a recording, especially recorded speech, and I ran a workshop on the basics of how to use it, with online materials that you can practice with if you want to learn more. There are also other great tutorials online that you should search for.
One of the best features of Praat is the ability to segment sounds using TextGrids, which are basically text files that identify sections of a sound file using timestamps. The benefit of this is that once you have properly annotated a sound file you can use scripts to automate analyses, which saves a lot of time that would otherwise be spent taking individual measurements. When I first started my PhD I spent a good amount of time learning to write Praat scripts, which turned out to be a continuation of the programming I learned when I was younger (Basic, QBasic) and a worthy introduction to programming languages like Python.
Since this has turned out to be a post that discusses Praat scripting, I'm going to introduce/attach some of the scripts I wrote/use for acoustic analysis, and link to some of the many other places you can find scripts for your particular use case. In my case these scripts are mainly in service of documentation and description of endangered and unwritten languages, but maybe others will find them useful as well.
Automatically measuring sounds:
This script ("dur_f0_f1_f2_f3_intensity.praat") is one that I modified (originally from this script but more recently I based it on this script) to give automatic measurements of segmented sounds in a TextGrid. It is an updated version of the “msr&check…" file that I made available along with the workshop I linked to above. At the time, I had recorded several wordlists in Pnar, and I spent countless hours segmenting the sounds in each word. My thinking was that even if my segmentation wasn't precise, the sheer number of sounds and their tabulation would allow me to run valid quantitative analyses. As it worked out, this was mostly the case, and I was able to target the outliers for closer examination. I also got better at recognizing Pnar sounds from all the time I spent with the words. I have now updated this script to work nicely with the following script, which plots vowels for you in the Praat picture window, which can produce print-publication-friendly images.
Vowel plot for formants:
Another that I wrote/modified from other bits takes a comma-delimited CSV spreadsheet with formant values and plots them (in the standard vowel chart format) as a Praat drawing with an oval marking their standard deviation (“draw_formants_plot_std_dev.praat”). I wrote this primarily to produce a clearer image than the one produced by JPlotFormants for my PhD thesis. Thanks also to the Praat User Group for their help with getting the script right.
I recently modified this script to work nicely with the automatic measurement script above. What this means is that you can segment all your words using TextGrids, run the script above to produce a CSV, and then just run this script to plot characters from that CSV. I implemented a 'Sequential' option for the plot so you can plot one vowel at a time, which means that you can leave all the segmented consonants (and VOT annotations) in the CSV file for later analysis. Or you can remove them, up to you. Just keep in mind that if you do have consonants in the CSV, it WILL try to plot them on the chart unless you choose the Sequential option.
The third script linked here (“tone_analysis.praat”) I recently wrote in order to take continuous measurements of tones without normalization. This is more for exploration of tonal systems on a per-speaker basis, allowing the investigator to identify whether length is potentially a factor in the characteristics of a particular tone. I am planning to modify it to allow for percentage-based analysis (and thus normalization) of tones, which could be used by the investigator to create clearer plots once they identify the characteristics of the individual tones. But I haven’t gotten around to it yet. I'll write another blog post when I do.
As a final note, these scripts are really just the tip of the iceberg when it comes to the kind of analysis you can do in Praat. For more on Praat scripting, check out this great tutorial, Will Styler's excellent blog, the scripts he uses/maintains, these resources at UW and these from UCLA. You can also follow along with Bartlomiej Plichta as he leads you through some scripting lessons in his videos, which are very useful.
When I started my PhD program in Linguistics (language documentation and description), I had some experience with linguistic analysis, but not to the degree that I had to learn in order to complete my PhD. I had tuned my ear to be able to hear the sounds of the IPA, and had practice transcribing and learning a range of languages, but I had never analyzed an unwritten language completely by myself. During the course of my PhD I learned much more about how to analyze languages 'from the ground up', so to speak.
Along the way, I discovered that there were some excellent tools that made me much more effective and efficient at the task of documenting and describing an unwritten language. I was fortunate that I already had a good foundation in recording and processing audio from my experiences recording, mixing, and releasing my music, so the fact that the audio data I recorded would form the basis of my analysis didn't phase me. However, there were another whole set of tools that would allow me to investigate the details of the language I planned to work on.
Each of these programs is open source or free, though some are developed for Windows and others are developed for MacOS, which might be a problem for some people. Since I grew up with DOS and Windows but then later switched to a Mac, I'm comfortable with both systems. The Apple/Mac laptop build quality was my first choice for travel and portability combined with power. I say 'was' since some of Apple's recent design choices mean I might be switching back to Windows on my next laptop. But for now I run an old Windows version on my Mac via Virtualbox or bundle Windows software in a Wine port so I can run it as a native app in MacOS.
I'll plan to describe each of these tools in more detail in future posts, but for now here's
A list of the tools I currently use for my linguistic work:
Tools other linguists use, but that I don't use much:
I'm a linguist and singer-songwriter. I write about life, travel, language and technology.