Hiram Ring
  • Posts
  • Music
    • Projects
    • Downloads
    • Bio/Press
    • Music Photos
    • Music Links
    • Store
  • Linguistics
    • Travel Photos
    • Useful Linguistic Links
  • Posts
  • Music
    • Projects
    • Downloads
    • Bio/Press
    • Music Photos
    • Music Links
    • Store
  • Linguistics
    • Travel Photos
    • Useful Linguistic Links

Computer-assisted syntactic reconstruction

9/16/2019

2 Comments

 
Things have been quite busy over the last year or so, when I last posted. My wife and I moved to Zurich with our 4-month old, I helped organize a workshop on word order here at UZH, a proceedings volume from the last ICAAL that I co-edited got published in December, we attended a bunch of conferences (AAS in Denver, SEALS in Tokyo, ICHL in Canberra), I lost my passport on the way to ICAAL in Chiang Mai (I've since applied for another), and various other things happened.

In the meantime we have also been hard at work digitizing, transcribing, and annotating data from multiple Austroasiatic languages. Alongside this effort we have been developing semi-automated ways of comparing clauses and identifying possible correspondences for syntactic reconstruction. The field of syntactic reconstruction has been gaining traction over the past decade as a viable area for study in historical linguistics (see here, here, and here for some work), and it's exciting to be working on ways that computers can help us in this task.

One interesting observation we can make is that our methodology does actually identify crosslinguistic structural similarities. We can see this in the following plot, which compares the number of clauses deemed 'similar' by our method in two datasets (thanks to Damian Blasi for suggesting this means of assessing our method). The first dataset is our current dataset with over 9,000 clauses annotated. Across 10 languages in 5 subgroups, this results in over 23 million pairwise comparisons. The second dataset is composed of the same clauses, but with the elements in each clause randomized by language. The plotted lines are the distribution of similarity judgments across each dataset.
Picture
We can see that using our method for clause comparison the randomized dataset shows a normal distribution - which is what we expect from unstructured data. With the same method, however, the dataset of annotated clauses in Austroasiatic languages shows a non-normal distribution. This tells us that the real language data is structured AND that our method for measuring similarity picks up on this structure, identifying a higher degree of similarity between clauses in languages that we know are related.

This raises a lot of new questions and
highlights the need for more testing to identify the best way of assessing similarity between clauses in a systematic and linguistically appropriate manner. Fortunately our project is not yet over!
2 Comments

ICAAL 7 Proceedings volume

9/12/2019

0 Comments

 
Picture
Planting rice near Jowai, NE India
At the end of last year the ICAAL 7 proceedings volume was published by University of Hawai'i Press as a special issue of the Journal of the South-East Asian Linguistics Society (JSEALS). The 8th ICAAL was just held in Chiang Mai, and so I think it's worth writing a bit about the 7th ICAAL proceedings, even at such a late date. The issue was edited by myself and Felix Rau (University of Cologne), and I wrote more details about it in a Twitter thread when it was first released. It was my first attempt at (co-)editing an issue/volume, and it was a good experience, made more so by an excellent co-editor, timely responses generally from authors and reviewers alike, and the support/advice of Mark Alves, Paul Sidwell, and Mathias Jenny. It was such a good experience, in fact, that Felix and I have agreed to edit the proceedings from ICAAL 8.

I won't go into great detail about the papers, since the issue is open-access, and so anyone interested can follow the link above and download/read the abstracts/papers at their leisure. But I do want to highlight a few general points about the encouraging direction it shows for Austroasiatic studies. There is also an extensive backstory to the history of the International Conference on Austro-Asiatic Linguistics (ICAAL) that provides a bit more context. One take on at least part of that backstory can be found here, and more can be found here.
Picture
Presenters and attendees of the 7th ICAAL in Kiel, Germany
​The seventh International Conference on Austro-Asiatic Linguistics (attendees pictured above) was held in Kiel, Germany in 2017. One point that we note in the introduction to these papers is that this is only the fourth published proceedings volume since the conference's inception in 1973. Over a span of 40+ years, 7 ICAAL meetings have been held, and proceedings have been published for just over half. There are various reasons for this, but we hope that this special issue is part of a trend.

Bolstering this trend is the fact that the majority of the papers in this special issue are by relatively young linguists. In the field of Austroasiatic linguistics there are well-known and well-cited scholars such as Harry Shorto, Gerard Diffloth, Philip Jenner, Eugenie Henderson, Norman Zide, Geoffrey Benjamin, Ilia Peiros, Patricia Donegan and Michel Ferlus, but the majority of their work was done in the 60s-90s. Some of these scholars have passed on, and only a few scholars such as Niclas Burenhult, Nicole Kruspe, Paul Sidwell, Greg Anderson, Mark Alves, Nathan Badenoch and Mathias Jenny have 'carried the torch', as it were, and worked to extend and expand our knowledge of the Austroasiatic languages, especially in the last 10 years. Thanks to their efforts, however, and especially to their mentorship, there is a growing number of young scholars who are working on these languages, providing important insights and datasets for other scholars.

The focus on data and the attempt to make primary data accessible is a particularly heartening feature of this issue. The data is accessible either through online, open-access repositories or through included examples, tables, or appendices. While previous work on Austroasiatic languages included such data, the inclusion of online repositories follows a growing trend in the social sciences where underlying data can be assessed and results can be replicated by other scientists, or an analysis can be contradicted or refined. The benefits of this 'open science' approach are mainly that the focus is taken off of the individual and whether they argue well for a position, and instead the focus is placed on what the best interpretation of the data is, and whether the data supports the individual's argument. This is only possible when the data is accessible.
0 Comments

Moving to Zurich

2/21/2018

1 Comment

 
​The last post was a bit of a brain dump to make sure I didn't forget a few lessons I learned, in part because I knew I was quitting the job that involved doing ML type things. While I was working there of course I learned a lot and (I think) acquitted myself pretty well, but language processing and machine learning are not really what I spent 4 years doing for my PhD. Python is a programming language that I picked up to make my work in grammatical description and syntax easier, and while I find ML (and programming) pretty interesting, my main interest lies in understanding how languages work through comparison, with the ultimate goal of reconstructing linguistic structures and (hopefully) prehistory.

A year and a half ago or so I started working on a grant proposal for that exact thing with some researchers at the University of Zurich. This is a relatively young department that is doing some really cool research in typology, processing, and language acquisition from a corpus-based perspective on multiple languages (both Indo-European and non-IE families/phyla). At the same time historical linguistics is a huge focus in the department, as is modeling language change. This is super exciting because I take the perspective that language is spoken by individuals in communities who acquire language from their forbears (history), use it as a tool for communication (processing), which gives rise to statistical tendencies that all languages share (typology). Since it is individuals using language, this is done in an idiosyncratic way, but since language is learned and guided by principles of processing, the only way to get at both the commonality and the uniqueness of language is by investigating actual language corpora (recordings, transcriptions, etc). Of course the story of how languages change is much more complex and involves many more factors, but be that as it may, this is a great place to be.
Lake Zurich
Picture: ​Lake Zurich from the hill above the university
​So, long story short, we found out last October that the grant had been funded, and the family and I started making plans to move to Zurich. More on that later, perhaps. With this project, our goal at the moment is to build a database of Austroasiatic language corpora that we can then investigate for all sorts of interesting phenomena, but focusing (initially at least) on word order. By comparing word order in multiple languages of the same family we intend to make an effort toward reconstructing the form of the parent languages from which the present-day spoken languages diverged, and also to identify language contact and interaction effects to contribute to discussions about the development of word order patterns cross-linguistically.

I've been here only a few weeks, and our first year of the project involves a lot of data collection, so I'll be traveling quite a bit and having to learn some more languages (working on Swiss-German and Burmese right now). But even in the first few weeks we've made some progress and I'm excited to share more as the research continues. I'm definitely doing Python programming, and it looks like I'll learn some Javascript for various tools we intend to build. Maybe I'll even get to use machine learning at some point.
1 Comment

    About me

    I'm a linguist and singer-songwriter. I write about life, travel, language and technology.

    Archives

    January 2022
    May 2020
    September 2019
    July 2018
    February 2018
    December 2017
    August 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    December 2015
    May 2015
    December 2014
    November 2014
    October 2014
    September 2014
    August 2014
    July 2014
    June 2014
    April 2014
    March 2014
    December 2013
    October 2013
    August 2013
    July 2013
    June 2013
    May 2013
    April 2013
    March 2013
    February 2013
    January 2013

    Categories

    All
    3mt
    Abbi
    Acoustic
    Advice
    AI
    Album
    All I Want
    Analysis
    Andaman
    Annotation
    Archive
    Audio
    Austroasiatic
    Backup
    Biate
    Bibliography
    Breathe Deep
    China
    Chords
    Clause Similarity
    Cloud
    Collaboration
    Computers
    Conference
    Culture
    Data
    Data Access
    Datasets
    DataVerse
    Death
    Deixis
    Demonstratives
    Documentation
    Draw
    Duration
    DX
    E920
    Easter
    El Capitan
    E Reader
    E-reader
    Examples
    EXcel
    F0
    Failure
    Feature
    Fieldwork
    Formants
    Forums
    Friends
    Ghana
    Git
    Git Metaphor
    Greet The Dawn
    Hanvon
    HLS20
    Holiday
    Home
    How-to
    ICAAL
    Implicit Motives
    Instruction
    Intensity
    Interlinear
    I've Got A Girl
    Kindle
    Language
    LaTeX
    Linguistics
    LyX
    Mac
    Machine Learning
    Mastering
    Metaphor
    MU
    Myanmar
    Natural Language Processing
    Neural Networks
    New Release
    News
    NLP
    NLTK
    Open Science
    Papers
    Paperwhite
    Pdf
    PhD
    Phonetics
    Phonology
    Pitch
    Plot
    Pnar
    Praat
    Practical
    Process
    Processing
    Production
    Programming
    Psalms
    Psychology
    Publications
    Publicity
    Python
    Radar Radio
    Reasons
    Recording
    Research
    Review
    Scripts
    Sentiment Analysis
    Singapore
    Song
    Soundfarm
    Sports
    Studio
    Subrepo
    Syntactic Reconstruction
    Text Classification
    Thailand
    Thesis
    Things To Know
    This Lamp
    Thoughts
    Tips
    Tone
    Toolbox
    Tools
    Track List
    Transcriber
    Transcriber 1.5.2
    Transcription
    Travel
    Trs2txt
    Update
    USA
    UZH
    Valentine's Day
    Version Control
    Video
    Vowels
    Web App
    Website
    Wedding
    Word - Flesh
    Workflow
    World Cup
    Writing
    YUFL
    Zion's Walls
    Zurich

    RSS Feed

    prev. blog

      Contact me

    Submit
Powered by Create your own unique website with customizable templates.