Things have been quite busy over the last year or so, when I last posted. My wife and I moved to Zurich with our 4-month old, I helped organize a workshop on word order here at UZH, a proceedings volume from the last ICAAL that I co-edited got published in December, we attended a bunch of conferences (AAS in Denver, SEALS in Tokyo, ICHL in Canberra), I lost my passport on the way to ICAAL in Chiang Mai (I've since applied for another), and various other things happened. In the meantime we have also been hard at work digitizing, transcribing, and annotating data from multiple Austroasiatic languages. Alongside this effort we have been developing semi-automated ways of comparing clauses and identifying possible correspondences for syntactic reconstruction. The field of syntactic reconstruction has been gaining traction over the past decade as a viable area for study in historical linguistics (see here, here, and here for some work), and it's exciting to be working on ways that computers can help us in this task. One interesting observation we can make is that our methodology does actually identify crosslinguistic structural similarities. We can see this in the following plot, which compares the number of clauses deemed 'similar' by our method in two datasets (thanks to Damian Blasi for suggesting this means of assessing our method). The first dataset is our current dataset with over 9,000 clauses annotated. Across 10 languages in 5 subgroups, this results in over 23 million pairwise comparisons. The second dataset is composed of the same clauses, but with the elements in each clause randomized by language. The plotted lines are the distribution of similarity judgments across each dataset. We can see that using our method for clause comparison the randomized dataset shows a normal distribution - which is what we expect from unstructured data. With the same method, however, the dataset of annotated clauses in Austroasiatic languages shows a non-normal distribution. This tells us that the real language data is structured AND that our method for measuring similarity picks up on this structure, identifying a higher degree of similarity between clauses in languages that we know are related.
This raises a lot of new questions and highlights the need for more testing to identify the best way of assessing similarity between clauses in a systematic and linguistically appropriate manner. Fortunately our project is not yet over!
2 Comments
This website is newly updated! I just redesigned the layout and will be making it a bit more writing-oriented over the coming days and weeks. The reason for this is because of a realization that while I still enjoy writing and producing music (and you can still link to all my music-related content via the navigation menu), my focus and life/work trajectory has really shifted.
Another reason I haven’t updated this site more regularly and done more blogging is that at the end of 2015 I thought the AI website builder of the future was right around the corner (thegrid.ai). As you can read from this post, I (and so many other people) were wrong. I can’t really complain though - I think I got quite a lot from what I spent on the product, including a curiosity about A.I. and an understanding of how far we have to go before computers defeat humans and run our lives. I also got a website that I’m too embarrassed to link here because it basically looks like a really bad Tumblr account... like my old (now essentially defunct) Tumblr. Anyway, I’ll keep checking my AI website periodically, and maybe I’ll be able to finally move everything from here to that site and my life will achieve some semblance of integration. As I've been working with code to try and do some programming to get the computer to format my text properly, I've run into some issues. It's got me thinking... You know how computers think... wait, you do?! No you don't! Computers don't think, unfortunately, that's the problem. Computers aren't good at connecting the dots or making inferences like humans are. All they can do is connect the dots that a human tells them to. There's the rub. The computer is only as smart as you are.
Fortunately, when I'm writing a program to go through my 80,000+ words of text (times 6, since there's 4 lines of interlinearization plus one of free translation = 480,000) which it parses in an instant, the computer tells me when it fails. Or rather, since I'm writing the code, when I FAIL. You know exactly where you stand with a computer, because there's only one right way for a code to run, and that's if all the processes are logical and well-formed according to the rules of the code's architecture. I must say I'm glad that life isn't that way. Yes, there are principles that can be recognized and lived. You generally receive from life based on what you put into relationships, study, work, etc... But there's no single perfect way to run. It's not like the world is a giant piece of code architecture and your life is a logical process from one thing to another. Life is dynamic. It can change and be changed by a small movement in one direction or another. And failure is just the beginning of a new direction. On the way back to the office from dinner the other night (see how much time this coding takes if I go back to the office after dinner!) I was talking with one of my friends about job prospects and how life changes. There's a lot of uncertainty, but I said that one thing I've learned is to figure out what is important to you and make it part of your life. I guess I'm still figuring... |
About meI'm a linguist and singer-songwriter. I write about life, travel, language and technology. Archives
January 2022
Categories
All
prev. blog
|