Henrik LiljegrenProfessor

About me

As a linguist, I have a particular interest in the languages of the Hindu Kush-Karakoram region, i.e., the mountainous areas of northern Pakistan, northeastern Afghanistan and the disputed territory of Kashmir. Many of those languages are lesser-described, endangered and under-resourced. I resided for a period of 10 years in northern Pakistan and have carried out fieldwork in individual languages of the region as well as conducted areal-typological research by means of collaborative methods.

Apart from research per se, I advise individuals and communities on their revitalization efforts (orthography and local resource development, mother-tongue-based education, etc.), mentor language activists and scholars from various communities to collect and organize data, and help building networks between local communities and organizations.

Teaching

At Stockholm University, I am involved in supervising thesis work and in teaching general linguistics and language documentation.

Research

My research focus is currently on building a language corpus, a lexical database and describing Gawarbati, one of many sparsely documented and under-resourced languages spoken in the Hindu Kush region. During the period 2021-2024, the Swedish Research Council funded an extensive collection of video and audio data from Gawarbati as well as further processing of the material in the form of transcription, translation and glossing. All data collection was carried out in close collaboration with the local language community and with the language resource center Forum for Language Initiatives (based in Islamabad).

In a previous areal-typological project (2015-2020), I produced a linguistic profile of the Hindu Kush-Karakoram region, based on first-hand data collected from 59 language varieties within the project. One tangible outcome is the online database Hindu Kush Areal Typology: https://hindukush.clld.org/

Research projects

Publications

A selection from Stockholm University publication database

Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages

2023. Paul Heggarty (et al.). Science 381 (6656)

Article

Languages of the Indo-European family are spoken by almost half of the world’s population, but their origins and patterns of spread are disputed. Heggarty et al. present a database of 109 modern and 52 time-calibrated historical Indo-European languages, which they analyzed with models of Bayesian phylogenetic inference. Their results suggest an emergence of Indo-European languages around 8000 years before present. This is a deeper root date than previously thought, and it fits with an initial origin south of the Caucasus followed by a branch northward into the Steppe region. These findings lead to a “hybrid hypothesis” that reconciles current linguistic and ancient DNA evidence from both the eastern Fertile Crescent (as a primary source) and the steppe (as a secondary homeland).

Read more about Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages
The Languages of Peristan through the Lens of Areal Typology

2023. Henrik Liljegren. Roots of Peristan, 391-431

Chapter

In this study, the mountainous region of High Asia referred to as Peristan is outlined and discussed from an areal-linguistic perspective, based on first-hand data from more than fifty separate language varieties. While a comparison of the basic lexicon largely confirms established phylogenetic classification, many structural properties tend to cluster together geographically and often display convergence across phylogenetic boundaries. However, the analysis does not lend support to a simplistic description of the region as a single linguistic area with clear boundaries. Instead, it suggests that a set of languages, situated at the inner parts of Peristan, forms a hard core displaying a significant degree of structural similarity, with a gradual decrease in the number of shared properties towards its fluid outer boundaries. Another significant finding is the identification of three distinct micro-areas within Peristan. These convergence areas map convincingly—but not perfectly—with the Peristan geo-cultural regions suggested in earlier studies, the latter based on (mainly pre-Islamic) shared cultural, social, political and religious identities. The areal-linguistic patterns emerging are likely to be of considerable time-depth and appear to be the result of long-standing cross-community interaction on a sub-regional level.

Read more about The Languages of Peristan through the Lens of Areal Typology
Nuristani in its areal and typological context

2022. Henrik Liljegren. International Journal of Diachronic Linguistics and Linguistic Reconstruction 19, 201-265

Article

This study presents and details Nuristani, a phylogenetically distinctgroup of languages spoken in a remote area of northeastern Afghanistan. Largelybased on a recently collected data set from six Nuristani varieties, a large numberof structural properties, representing several linguistic domains (phonology,grammar and lexico-semantics), were analysed and systematically comparedwith world-wide typologies as well as with a tight and representative 53-language sample from the surrounding Hindu Kush region. Nuristani emerges as an integralpart of region-wide areal patterns, shared to a varying extent with languagesbelonging to six distinct phylogenies. In a majority of structural domains, Nuristaniclusters with a Hindu Kush core, including many Indo-Aryan languages,some Iranian, Tibeto-Burman and the isolate Burushaski. While Nuristani generallycomes out as internally homogeneous, one of the languages, Prasun, deviatesfrom that pattern; in certain respects, particularly morpho-syntactically, it clustersmore closely with languages other than its closest Nuristani kin, possibly asthe result of substratal influence. Only a small number of structural propertiescan be termed typically Nuristani: the presence of a retroflex approximant, linguisticcoding of complex spatial distinctions, kinship suffixes, and a set of finetuneddiscourse markers. Nuristani appears to be the source of subareal patternsdetectable also in some neighbouring non-Nuristani communities, most likely relatedto a shared pre-Muslim context.

Read more about Nuristani in its areal and typological context
Kinship terminologies reveal ancient contact zone in the Hindu Kush

2022. Henrik Liljegren. Linguistic typology 26 (2), 211-245

Article

The Hindu Kush, or the mountain region of northern Pakistan, north-eastern Afghanistan and the northern-most part of the Indian-administered Kashmir region, is home to approximately 50 languages belonging to six different genera: Indo-Aryan, Iranian, Nuristani, Sino-Tibetan, Turkic and the isolate Burushaski. Areality research on this region is only in its early stages, and while its significance as a convergence area has been suggested by several scholars, only a few, primarily phonological and grammatical, features have been studied in a more systematic fashion. Cross-linguistic research in the realms of semantics and lexical organization has been given considerably less attention. However, preliminary findings indicate that features are geographically bundled with one another, across genera, in significant ways, displaying semantic areality on multiple levels throughout the region or in one or more of its sub-regions. The present study is an areal-typological investigation of kinship terms in the region, in which particular attention is paid to a few notable polysemy patterns and what appears to be a significant geographical clustering of these. Comparisons are made between the geographical distribution of such patterns and those of some other linguistic features as well as with relevant non-linguistic factors related to shared cultural values or identities and a long history of small-scale cross-community interaction in different parts of the region.

Read more about Kinship terminologies reveal ancient contact zone in the Hindu Kush
Ergativity and Gilgiti Shina

2022. Carla F. Radloff, Henrik Liljegren. Languages of Northern Pakistan, 317-347

Chapter

Read more about Ergativity and Gilgiti Shina
The Hindu Kush–Karakorum and linguistic areality

2020. Henrik Liljegren. Journal of South Asian languages and linguistics 7 (2), 187-233

Article

The high-altitude Hindu Kush–Karakoram region is home to more than 50 language communities, belonging to six phylogenies. The significance of thisregion as a linguistic area has been discussed in the past, but the tendency has been to focus on individual features and phenomena, and more seldom have there been attempts at applying a higher degree of feature aggregation with tight sampling. In the present study, comparable first-hand data from as many as 59 Hindu Kush–Karakoram language varieties, was collected and analyzed. The data allowed for setting up a basic word list as well as for classifying each variety according to 80 binary structural features (phonology, lexico-semantics, grammatical categories, clause structure and word order properties). While a comparison of the basic lexicon across the varieties lines up very closely with the established phylogenetic classification, structural similarity clustering gives results clearly related to geographical proximity within the region and often cuts across phylogenetic boundaries. The strongest evidence of areality tied to the region itself (vis-à-vis South Asia in general on the one hand and Central/West Asia on the other) relates to phonology and lexical structure, whereas morphosyntactic properties mostly place the region’s languages within a larger areal or macro-areal distribution. The overall structural analysis also lends itself to recognizing six distinct micro-areas within the region, lining up with geo-cultural regions identified in previous ethno-historical studies. The present study interprets the domain-specific distributions as layers of areality that are each linked to a distinct historical period, and that taken together paint a picture of a region developing from high phylogenetic diversity, through massive Indo-Aryan penetration and language shifts, to today’s dramatically shrinking diversity and structural stream-lining propelled by the dominance of a few lingua francas.

Read more about The Hindu Kush–Karakorum and linguistic areality
Emerging epistemic marking in Indo-Aryan Palula

2020. Henrik Liljegren. Evidentiality, egophoricity and engagement, 141-163

Chapter

While evidentiality is neither systematically nor obligatorily signaled in IndoAryan Palula [phl; phal1254] (Pakistan), it can be observed in so-called scattered coding. It is most obviously reflected in three sub-systems of the language: a) as a secondary effect of tense—aspect differentiation, mostly clearly seen in the use of the perfect for indirect evidence vis-à-vis the use of the simple past for direct evidence; b) by a set of utterance-final mood markers, involving an emerging threeway paradigmatic contrast: thaní as quotative, maní as hearsay and ɡa as inferred knowledge; and c) by (at least) one member of a set of second-position discourse particles, xu, marking surprise. Although evidentiality contrasts akin to the perfect vs. simple past were indeed part of the ancestral Indo-Aryan tense system, there are plenty of parallels in adjacent languages to the epistemic contrasts noted for Palula, suggesting that more recent language contact must have contributed to, or largely facilitated, the emergence of epistemic marking in the language.

Read more about Emerging epistemic marking in Indo-Aryan Palula
Gender typology and gender (in)stability in Hindu Kush Indo-Aryan languages

2019. Henrik Liljegren. Grammatical gender and linguistic complexity, 279-328

Chapter

This paper investigates the phenomenon of gender as it appears in 25 Indo-Aryan languages (sometimes referred to as “Dardic”) spoken in the Hindu Kush-Karakorum region – the mountainous areas of northeastern Afghanistan, northern Pakistan and the disputed territory of Kashmir. Looking at each language in terms of the number of genders present, to what extent these are sex-based or non-sex-based, how gender relates to declensional differences, and what systems of assign-ment are applied, we arrive at a micro-typology of gender in Hindu Kush Indo-Aryan, including a characterization of these systems in terms of their general com-plexity. Considering the relatively close genealogical ties, the languages display a number of unexpected and significant differences. While the inherited sex-based gender system is clearly preserved in most of the languages, and perhaps even strengthened in some, it is curiously missing altogether in others (such as in Kalasha and Khowar) or seems to be subject to considerable erosion (e.g. in Dameli). That the languages of the latter kind are all found at the northwestern outskirts of the Indo-Aryan world suggests non-trivial interaction with neighbouring languages without gender or with markedly different assignment systems. In terms of com-plexity, the southwestern-most corner of the region stands out; here we find a few languages (primarily belonging to the Pashai group) that combine inherited sex-based gender differentiation with animacy-related distinctions resulting in highly complex agreement patterns. The findings are discussed in the light of earlier obser-vations of linguistic areality or substratal influence in the region, involving Indo-Aryan, Iranian, Nuristani, Tibeto-Burman, Turkic languages and Burushaski. The present study draws from the analysis of earlier publications as well as from en-tirely novel field data.

Read more about Gender typology and gender (in)stability in Hindu Kush Indo-Aryan languages
Palula dictionary

2019. Henrik Liljegren. Dictionaria

Article

Read more about Palula dictionary
Supporting and sustaining language vitality in northern Pakistan

2018. Henrik Liljegren. The Routledge Handbook of Language Revitalization, 427-437

Chapter

Northern Pakistan is linguistically and culturally very diverse. Nearly 30 languages—representing a wide span, numerically and vitality-wise—are spoken in this mountainous region, sharing ties with adjacent areas of neighboring countries. Although most of these languages have received little outside recognition, there have been few restrictions for those wanting to promote their languages. Therefore, a number of sustaining efforts have been made in recent years, exemplified throughout the chapter: collaborative fieldwork, the formation of language organizations, training in documentation, the development of orthographies, publications, the introduction of mother-tongue schools, and lobbying for the region’s languages. Evaluating some of those activities and their effectiveness in terms of language maintenance and revitalization, some key factors stand out: community ownership, institutional support, pooling of resources, and multi-community collaboration. The observations and subsequent analysis are informed by the author’s own long-term involvement in the development of the Forum for Language Initiatives.

Read more about Supporting and sustaining language vitality in northern Pakistan
Geomorphic coding in Palula and Kalasha

2018. Jan Heegård, Henrik Liljegren. Acta Linguistica Hafniensia. International Journal of Structural Linguistics 50 (2), 129-160

Article

The article describes the geomorphic systems of spatial reference in the two Indo-Aryan languages Palula and Kalasha, spoken in adjacent areas of an alpine region in Northwestern Pakistan. Palula and Kalasha encode the inclination of the mountain slope as well as the flow of the river, in systematic and similar ways, and by use of distinct sets of nominal lexemes that may function adverbially. In their verbal systems, only Palula encode, landscape features in a systematic way, but both languages make use of a number of verbal sets that in different ways emphasise boundary-crossing. The article relates the analysis to Palmer's Topographic Correspondence Hypothesis that predicts that the linguistic system of spatial reference will reflect the topography of the surrounding landscape. The analysis of the geomorphic systems in Palula and Kalasha supports this hypothesis. However, data from a survey of spatial strategies in neighbouring languages, i.e., languages spoken in a similar alpine landscape, reveal another system that does not to the same extent or in a similar way encode typical landscape features such as the mountain slope and the flow of the river. This calls for a revision of Palmer's hypothesis that also takes language contact into consideration.

Read more about Geomorphic coding in Palula and Kalasha
Bisyndetic Contrast Marking in the Hindukush

2017. Henrik Liljegren, Erik Svärd. Journal of Language Contact 10 (3), 450-484

Article

A contrastive (or antithetical) construction which makes simultaneous use of two separate particles is identified through a mainly corpus-based study as a typical feature of a number of lesser-described languages spoken in the Afghanistan-Pakistan borderland in the high Hindukush. The feature encompasses Nuristani languages (Waigali, Kati) as well as the Indo-Aryan languages found in their close vicinity (Palula, Kalasha, Dameli, Gawri), while it is not shared by more closely related Indo-Aryan languages spoken outside of this geographically delimited area. Due to a striking (although not complete) overlap with at least two other (unrelated) structural features, pronominal kinship suffixes and retroflex vowels, we suggest that a linguistic and cultural diffusion zone of considerable age is centred in the mountainous Nuristan-Kunar-Panjkora area.

Read more about Bisyndetic Contrast Marking in the Hindukush
Semantic patterns from an areal perspective

2017. Maria Koptjevskaja-Tamm, Henrik Liljegren. The Cambridge handbook of areal linguistics, 204-236

Chapter

Read more about Semantic patterns from an areal perspective
Khowar

2017. Henrik Liljegren, Afsar Ali Khan. Journal of the International Phonetic Association 47 (2), 219-229

Article

Khowar (ISO 639-3: khw) is an Indo-Aryan language spoken by 200,000–300,000 (Decker 1992: 31–32; Bashir 2003: 843) people in Pakistan's Khyber Pakhtunkhwa Province (formerly North-West Frontier Province). The majority of the speakers are found in Chitral (a district and erstwhile princely state bordering Afghanistan, see Figure 1), where the language is used as a lingua franca, but there are also important pockets of speaker groups in adjacent areas of Gilgit-Baltistan and Swat District as well as a considerable number of recent migrants to larger cities such as Peshawar and Rawalpindi (Decker 1992: 25–26). Its closest linguistic relative is Kalasha, a much smaller language spoken in a few villages in southern Chitral (Morgenstierne 1961: 138; Strand 1973: 302, 2001: 252). While Khowar has preserved a number of features (phonological, morphological as well as lexical) now lost in other Indo-Aryan languages of the surrounding Hindukush-Karakoram mountain region, it has, over time, incorporated a massive amount of lexical material from neighbouring or influential Iranian languages (Morgenstierne 1936) – and with it, new phonological distinctions. Certain features might also be attributable to formerly dominant languages (e.g. Turkic), or to linguistic substrates, either in the form of, or related to, the language isolate Burushaski, or other, now extinct, languages previously spoken in the area (Morgenstierne 1932: 48, 1947: 6; Bashir 2007: 208–214). There is relatively little dialectal variation among the speakers in Chitral itself, probably attributable to the relative recency of the present expansion of the language (Morgenstierne 1932: 50).

Read more about Khowar
Profiling Indo-Aryan in the Hindukush-Karakoram: A preliminary study of micro-typological patterns

2017. Henrik Liljegren. Journal of South Asian languages and linguistics 4 (1), 107-156

Article

The study is a typological profile of 31 Indo-Aryan (IA) languages in the Hindukush-Karakoram-Western Himalayan region (covering NE Afghanistan, N Pakistan, and parts of Kashmir). Native speakers were recruited to provide comparative data. This data, supplemented by reputable descriptions or field notes, was evaluated against a number of WALS- or WALS-like features, enabling a fine-tuned characterization of each language, taking different lin-guistic domains into account (phonology, morphology, syntax, lexicon). The emerging patterns were compared with global distributions as well as with characteristic IA features and well-known areal patterns. Some features, mainly syntactic, turned out to be shared with IA in general, whereas others do have scattered reflexes in IA outside of the region but are especially prevalent in the region: large consonant inventories, tripartite pronominal case alignment, a high frequency of left-branching constructions, and multi-degree deictic sys-tems. Yet other features display a high degree of diversity, often bundling subareally. Finally, there was a significant clustering of features that are not characterizing IA in general: tripartite affricate differentiation, retroflexion across several subsets, aspiration contrasts involving voiceless consonants only, tonal contrasts and 20-based numerals. This clustering forms a “hard core” at the centre of the region, gradually fading out toward its peripheries.

Read more about Profiling Indo-Aryan in the Hindukush-Karakoram
A grammar of Palula

2016. Henrik Liljegren.

Book

This grammar provides a grammatical description of Palula, an Indo-Aryan language of the Shina group. The language is spoken by about 10,000 people in the Chitral district in Pakistan’s Khyber Pakhtunkhwa Province. This is the first extensive description of the formerly little-documented Palula language, and is one of only a few in-depth studies available for languages in the extremely multilingual Hindukush-Karakoram region. The grammar is based on original fieldwork data, collected over the course of about ten years, commencing in 1998. It is primarily in the form of recorded, mainly narrative, texts, but supplemented by targeted elicitation as well as notes of observed language use. All fieldwork was conducted in close collaboration with the Palula-speaking community, and a number of native speakers took active part in the process of data gathering, annotation and data management. The main areas covered are phonology, morphology and syntax, illustrated with a large number of example items and utterances, but also a few selected lexical topics of some prominence have received a more detailed treatment as part of the morphosyntactic structure. Suggestions for further research that should be undertaken are given throughout the grammar. The approach is theory-informed rather than theory-driven, but an underlying functional-typological framework is assumed. Diachronic development is taken into account, particularly in the area of morphology, and comparisons with other languages and references to areal phenomena are included insofar as they are motivated and available. The description also provides a brief introduction to the speaker community and their immediate environment.

Read more about A grammar of Palula

Show all publications by Henrik Liljegren at Stockholm University

Edit the profile

Henrik LiljegrenProfessor

About me

Teaching

Research

Research projects

Publications

Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages

The Languages of Peristan through the Lens of Areal Typology

Nuristani in its areal and typological context

Kinship terminologies reveal ancient contact zone in the Hindu Kush

Ergativity and Gilgiti Shina

The Hindu Kush–Karakorum and linguistic areality

Emerging epistemic marking in Indo-Aryan Palula

Gender typology and gender (in)stability in Hindu Kush Indo-Aryan languages

Palula dictionary

Supporting and sustaining language vitality in northern Pakistan

Geomorphic coding in Palula and Kalasha

Bisyndetic Contrast Marking in the Hindukush

Semantic patterns from an areal perspective

Khowar

Profiling Indo-Aryan in the Hindukush-Karakoram: A preliminary study of micro-typological patterns

A grammar of Palula