Things have been quite busy over the last year or so, when I last posted. My wife and I moved to Zurich with our 4-month old, I helped organize a workshop on word order here at UZH, a proceedings volume from the last ICAAL that I co-edited got published in December, we attended a bunch of conferences (AAS in Denver, SEALS in Tokyo, ICHL in Canberra), I lost my passport on the way to ICAAL in Chiang Mai (I've since applied for another), and various other things happened.
In the meantime we have also been hard at work digitizing, transcribing, and annotating data from multiple Austroasiatic languages. Alongside this effort we have been developing semi-automated ways of comparing clauses and identifying possible correspondences for syntactic reconstruction. The field of syntactic reconstruction has been gaining traction over the past decade as a viable area for study in historical linguistics (see here, here, and here for some work), and it's exciting to be working on ways that computers can help us in this task.
One interesting observation we can make is that our methodology does actually identify crosslinguistic structural similarities. We can see this in the following plot, which compares the number of clauses deemed 'similar' by our method in two datasets (thanks to Damian Blasi for suggesting this means of assessing our method). The first dataset is our current dataset with over 9,000 clauses annotated. Across 10 languages in 5 subgroups, this results in over 23 million pairwise comparisons. The second dataset is composed of the same clauses, but with the elements in each clause randomized by language. The plotted lines are the distribution of similarity judgments across each dataset.
We can see that using our method for clause comparison the randomized dataset shows a normal distribution - which is what we expect from unstructured data. With the same method, however, the dataset of annotated clauses in Austroasiatic languages shows a non-normal distribution. This tells us that the real language data is structured AND that our method for measuring similarity picks up on this structure, identifying a higher degree of similarity between clauses in languages that we know are related.
This raises a lot of new questions and highlights the need for more testing to identify the best way of assessing similarity between clauses in a systematic and linguistically appropriate manner. Fortunately our project is not yet over!
At the end of last year the ICAAL 7 proceedings volume was published by University of Hawai'i Press as a special issue of the Journal of the South-East Asian Linguistics Society (JSEALS). The 8th ICAAL was just held in Chiang Mai, and so I think it's worth writing a bit about the 7th ICAAL proceedings, even at such a late date. The issue was edited by myself and Felix Rau (University of Cologne), and I wrote more details about it in a Twitter thread when it was first released. It was my first attempt at (co-)editing an issue/volume, and it was a good experience, made more so by an excellent co-editor, timely responses generally from authors and reviewers alike, and the support/advice of Mark Alves, Paul Sidwell, and Mathias Jenny. It was such a good experience, in fact, that Felix and I have agreed to edit the proceedings from ICAAL 8.
I won't go into great detail about the papers, since the issue is open-access, and so anyone interested can follow the link above and download/read the abstracts/papers at their leisure. But I do want to highlight a few general points about the encouraging direction it shows for Austroasiatic studies. There is also an extensive backstory to the history of the International Conference on Austro-Asiatic Linguistics (ICAAL) that provides a bit more context. One take on at least part of that backstory can be found here, and more can be found here.
The seventh International Conference on Austro-Asiatic Linguistics (attendees pictured above) was held in Kiel, Germany in 2017. One point that we note in the introduction to these papers is that this is only the fourth published proceedings volume since the conference's inception in 1973. Over a span of 40+ years, 7 ICAAL meetings have been held, and proceedings have been published for just over half. There are various reasons for this, but we hope that this special issue is part of a trend.
Bolstering this trend is the fact that the majority of the papers in this special issue are by relatively young linguists. In the field of Austroasiatic linguistics there are well-known and well-cited scholars such as Harry Shorto, Gerard Diffloth, Philip Jenner, Eugenie Henderson, Norman Zide, Geoffrey Benjamin, Ilia Peiros, Patricia Donegan and Michel Ferlus, but the majority of their work was done in the 60s-90s. Some of these scholars have passed on, and only a few scholars such as Niclas Burenhult, Nicole Kruspe, Paul Sidwell, Greg Anderson, Mark Alves, Nathan Badenoch and Mathias Jenny have 'carried the torch', as it were, and worked to extend and expand our knowledge of the Austroasiatic languages, especially in the last 10 years. Thanks to their efforts, however, and especially to their mentorship, there is a growing number of young scholars who are working on these languages, providing important insights and datasets for other scholars.
The focus on data and the attempt to make primary data accessible is a particularly heartening feature of this issue. The data is accessible either through online, open-access repositories or through included examples, tables, or appendices. While previous work on Austroasiatic languages included such data, the inclusion of online repositories follows a growing trend in the social sciences where underlying data can be assessed and results can be replicated by other scientists, or an analysis can be contradicted or refined. The benefits of this 'open science' approach are mainly that the focus is taken off of the individual and whether they argue well for a position, and instead the focus is placed on what the best interpretation of the data is, and whether the data supports the individual's argument. This is only possible when the data is accessible.
I'm a linguist and singer-songwriter. I write about life, travel, language and technology.