Documentation of Gawarbati – project completed

Gawarbati is a peculiar Indo-European language, spoken by approximately 15,000 people in the mountainous and partly inaccessible border area between northern Pakistan and northeastern Afghanistan. It has been the subject of extensive documentation and exploration in a research project, now completed.

Gawarum: The area along the Kunar River where the Gawarbati language is spoken.

Gawarum: The area along the Kunar River where the Gawarbati language is spoken. Photo: Muhammad Ali Shah

The project Gawarbati: Documenting a vulnerable linguistic community in the Hindu Kush was carried out with Professor Henrik Liljegren as its principal investigator.

The Swedish-Pakistani team, consisting of seven people in total, made video or audio recordings of a total of 22 hours, distributed over approximately 200 recorded sessions, involving more than 150 language users. 

The recorded material represents many different topics, situations, ages and professional groups

The recorded material represents many different topics, situations, ages and professional groups. It includes everything from folk tales, poetry and historical stories to in-depth interviews, lectures and group discussions. 

All of this material has been segmented and transcribed word for word using the annotation tool ELAN – both with IPA (an international phonetic alphabet) and with the Arabic-based written language that has recently been developed for Gawarbati. It was subsequently translated into English and Urdu (the national language of Pakistan). 

A large part of the material was further processed using software specially developed for the purpose and has resulted in a digital and searchable language bank and a lexical database that currently includes 7,500 entries!

A language bank for researchers – and for the speakers of Gawarbati

Henrik Liljegren. Professor at the Department of Linguistics.

Henrik Liljegren. Professor at the Department of Linguistics, at a visit in Pakistan. Photo: Henrik Liljegren

– The language bank, which will be expanded with the remaining collected material and further improved, can be studied for a long time and become useful for linguists and researchers in other disciplines – as well as for the speakers themselves, says Henrik Liljegren. 

One of the Pakistani team members has also digitized a locally produced dictionary comprising more than 20,000 entries. This dictionary is still in need of further systematization and editing, but will constitute an important complement to the corpus-based lexicon. 

– Through the project and the linguistic analysis that has been carried out, we have gained a better understanding of how the language's grammar is structured and what its vocabulary looks like. We are currently working on a text collection and an online dictionary for Gawarbati.

The project Gawarbati: Documenting a vulnerable linguistic community in the Hindu Kush was carried out between 2021 and 2024, in collaboration between the Department of Linguistics (Stockholm University) and the Islamabad-based language resource center Forum for Language Initiatives.  

More about Henrik Liljegren

Anastasia Panova, PhD student at the Department of Linguistics.

Anastasia Panova delivers her mid-term seminar, October 2024. Photo: Henrik Liljegren

A doctoral dissertation and several articles coming up

  • Anastasia Panova, PhD student at the Department of Linguistics, writes her doctoral dissertation within the project. It contains a comprehensive analysis of Gawarbati, and a description of all essential parts of the grammatical system in the language.
  • Several other scientific articles have also been written within the framework of the project, and will be published shortly. They present research findings related to e.g., the sound system and information structure of Gawarbati.
  • Based on collected and analyzed material, the project contributes – in the form of presentations, submitted manuscripts and already approved publications – to a better understanding of Gawarbati’s relationship to related languages (especially other Indo-Aryan languages spoken in the Hindu Kush region) and its historical development. 
  • The project also gives us an idea which linguistic features have been formed in contact with its closest neighbors or under the influence of nationally or regionally dominant languages. Gawarbati finds itself is in a transit zone between different contact areas, and therefore it shares features with each of these.

Last updated: 2025-12-04

Source: Department of Linguistics