I realize that some of my posts haven't been as clear as they could be. Specifically, I talked a lot about interlinearized texts, but what does that actually mean? Well, the thing about language is that when you are talking about specific aspects of language, it's helpful if the reader actually knows what you're talking about. Thus, examples are useful. When you're discussing an unwritten language, this has to be taken to a whole new level.
When I'm discussing examples in Pnar, I need four levels of representation, as in the example below. On the left the numbered lines represent the local orthography (line 1), the phonetic/phonemic representation using IPA (2), the word-for-word translation or English gloss (3), and the free translation that actually tells you the English meaning (4).
So on the left we have the four levels of representation, but you notice that the items on each line don't quite match up. This can be confusing, particularly if you're dealing with long examples. Interlinearization allows each element to correspond to one in the following line.
One way linguists do this is by creating tables, which have to be individually edited for each example. This is what you have to do in MSWord, unfortunately. Another way is using a typesetting program called LaTeX - this is how I produced the nicely formatted example on the right. Another convention is to have the local writing system be italicized and non-interlinearized.
Notice that the glosses on the third line are not exactly a translation equivalent, sometimes they are grammatical abbreviations for function words. Here, 'ALL' is an abbreviation for 'allative', which is a traditional term for a marker on nouns that indicates the noun to be a 'goal' or what another noun is moving towards.
Hopefully that clear things up a bit. To read more about interlinearized linguistic examples, this Wikipedia page should help.
So my team did not make it through to the next round. They played a good game against Portugal last night, but it just wasn't good enough. Too much ball control, not enough finishing power. A ton of missed chances. Disappointing, but such it is sometimes. Hopefully next World Cup will be a different story.
Now just to clarify, Ghana is the country I was born in. One of my Chinese friends was shocked that I wasn't supporting the USA in the World Cup - why would I even consider supporting a country other than the country of my passport? I explained that in the USA we like to support the 'underdog', a concept that took a bit of explaining. We actually spent 5 minutes with me trying to explain why Americans like to support teams that meet the requirements of 'the little guy'.
Now Ghana is definitely not completely the little guy when it comes to soccer (football for all the non-Americans), but as a nation when compared to the USA, which is pretty much the biggest on the block, they are. And as someone born and raised in the African nation, I guess I have more call than most to support them on the international stage.
But this definitely does not mean that I dislike the USA - not at all. In fact, I am really glad to be a citizen and am really glad they made it into the World Cup, and hope at some point they get to hoist the trophy. But given a sports matchup between them and Ghana, I'd have to support Ghana all the way. Though now that the US is through, I still have a team to support...
As I've been working with code to try and do some programming to get the computer to format my text properly, I've run into some issues. It's got me thinking... You know how computers think... wait, you do?! No you don't! Computers don't think, unfortunately, that's the problem. Computers aren't good at connecting the dots or making inferences like humans are. All they can do is connect the dots that a human tells them to. There's the rub. The computer is only as smart as you are.
Fortunately, when I'm writing a program to go through my 80,000+ words of text (times 6, since there's 4 lines of interlinearization plus one of free translation = 480,000) which it parses in an instant, the computer tells me when it fails. Or rather, since I'm writing the code, when I FAIL. You know exactly where you stand with a computer, because there's only one right way for a code to run, and that's if all the processes are logical and well-formed according to the rules of the code's architecture.
I must say I'm glad that life isn't that way. Yes, there are principles that can be recognized and lived. You generally receive from life based on what you put into relationships, study, work, etc... But there's no single perfect way to run. It's not like the world is a giant piece of code architecture and your life is a logical process from one thing to another. Life is dynamic. It can change and be changed by a small movement in one direction or another. And failure is just the beginning of a new direction.
On the way back to the office from dinner the other night (see how much time this coding takes if I go back to the office after dinner!) I was talking with one of my friends about job prospects and how life changes. There's a lot of uncertainty, but I said that one thing I've learned is to figure out what is important to you and make it part of your life. I guess I'm still figuring...
This past week I've been attending a workshop on the linguistic notion of Affectedness that my co-supervisor Frantisek Kratochvil organized. It has really helped me think about possible ways this feature could be at work in Pnar verbal constructions. And if you didn't understand that sentence feel free to ask and I'll try to explain it better. My brain has been fried most days this week.
While at the conference and in the evenings I've been working on organizing my linguistic database. A few weeks ago my friend Matt showed me how to use Python scripting to format and search the texts that output from my Toolbox database.
Toolbox allows for interlinearization of linguistic data, which is the standard for examples in linguistic papers and allows people who don't understand anything about the language to see the grammatical structure. It usually includes a local orthographic line of text, followed by IPA (International Phonetic Alphabet) representation, a line of word for word glosses (translations), and a free translation. Glosses and free translation are usually in English.
The script Matt wrote (with my input) allows for regex (regular expression) searching and output. So in my corpus I can find all the verbs followed by nouns, for example, or all the verbs preceded by the form 'ka', and output their context.
The script I wrote this week (with his input) takes the whole Toolbox corpus (or a portion thereof) and reformats it so that I can read it with a typesetting program called LyX, a front end GUI of the popular but obtuse typesetter LaTeX. I still have a bit of work to do, but basically it allows me to turn my corpus database of 90,000+ words into a nice corpus, typeset with interlinearization, as a PDF.
After excluding about 2 hrs of data from my corpus because of parsing issues in Toolbox, my resulting PDF file was over 700 pages of just interlinearized examples, with no other formatting. I don't think I'll be including it all in the dissertation I plan to submit in August, but it's amazing to have such a simple tool for outputting my data in a readable format.
I love technology...
I'm a linguist and singer-songwriter. I write about life, travel, language and technology.