When you discuss doing language documentation and description, one of the first things to know is that you have to collect language data. The primary source of language data is people who speak the language you're interested in, which then begs the question of how you record the data. There are some great books and papers on doing linguistic fieldwork of a documentary nature (more than what I've linked to here), but this post is focused more on the tools you use to process your data once it is recorded, as a continuation of my 'Linguistic Tools' post. I'll also plan to write a longer post on recording audio/video in the field, but for now I'll assume that you've recorded it already. I'll just briefly say that I like using a digital SLR like the Canon Rebel along with a unidirectional mic, in conjunction with a digital audio recorder like the Zoom H4N (ideally with a lapel mic of some kind).
Once you have your data recorded, the next step is to copy it to your computer for processing. Often the digital recordings will be rather large and cumbersome, and you may want to split them into smaller files, depending on how many stories/interactions you recorded. I find post-processing is important because it means you can focus on the interaction during the recording and then during processing you make notes of all the files, their content, and other metadata that will help later when you're not in the field and can't remember all the details.
In this processing stage you also want to do two very important things:
I use two programs for converting video: Media Converter and MPEG Streamclip. You could use just MPEG Streamclip (which has a Windows version), but on a Mac I find that Media Converter is much simpler/easier for reducing the size of the file, stripping out the audio, or other purposes. MPEG Streamclip is great, though, for combining multiple clips or splitting one clip into several. In each conversion you want to ensure that the video/audio quality is not compromised, depending on what you want to use it for. In my case I am mostly doing acoustic analysis, so I'm more interested in preserving the audio at CD quality (16 bit, 44.1 khz) which is the standard for acoustic analysis and archiving. In any case, since I've backed up the raw files, I can always copy from them if I mess up my working files and need to restore the quality.
To process/convert and work with audio I use Audacity - this is primarily for processing audio, not for acoustic analysis. Audacity supports a large range of encodings and formats, and you can select portions of the sound file to do basic processing like boosting the signal, removing noise, etc. These are generally not the best things to do to an audio signal, but they can be useful. In my case, this is particularly for when I'm playing the audio back and need to hear what someone said in the background during a conversation, or do other kinds of manipulations.
I can't stress enough the importance of backing up data and copying your data files to a new (staging) folder. This really ensures that you can always rewind the clock and reset, while being confident in exploring the data itself in your working folder. This should become an important part of your workflow so that it is second nature. In some cases we will make mistakes, but understanding the importance of backing up and creating metadata for your backups will help to mitigate perhaps catastrophic events. Happy converting!
Linguistic Tools (Mac/PC)
When I started my PhD program in Linguistics (language documentation and description), I had some experience with linguistic analysis, but not to the degree that I had to learn in order to complete my PhD. I had tuned my ear to be able to hear the sounds of the IPA, and had practice transcribing and learning a range of languages, but I had never analyzed an unwritten language completely by myself. During the course of my PhD I learned much more about how to analyze languages 'from the ground up', so to speak.
Along the way, I discovered that there were some excellent tools that made me much more effective and efficient at the task of documenting and describing an unwritten language. I was fortunate that I already had a good foundation in recording and processing audio from my experiences recording, mixing, and releasing my music, so the fact that the audio data I recorded would form the basis of my analysis didn't phase me. However, there were another whole set of tools that would allow me to investigate the details of the language I planned to work on.
Each of these programs is open source or free, though some are developed for Windows and others are developed for MacOS, which might be a problem for some people. Since I grew up with DOS and Windows but then later switched to a Mac, I'm comfortable with both systems. The Apple/Mac laptop build quality was my first choice for travel and portability combined with power. I say 'was' since some of Apple's recent design choices mean I might be switching back to Windows on my next laptop. But for now I run an old Windows version on my Mac via Virtualbox or bundle Windows software in a Wine port so I can run it as a native app in MacOS.
I'll plan to describe each of these tools in more detail in future posts, but for now here's
A list of the tools I currently use for my linguistic work:
Tools other linguists use, but that I don't use much:
I'm a linguist and singer-songwriter. I write about life, travel, language and technology.