Kristina Nilsson Björkenstam

Kristina Nilsson Björkenstam


Visa sidan på svenska
Works at Office of Human Science
Telephone 08-16 39 28
Visiting address Universitetsvägen 10 A, plan 5
Room A315
Postal address Samhällsvetenskapliga fakultetskansliet 106 91 Stockholm

About me

Director of education, Office of Human Science.

PhD in Computational Linguistics.

Director of Studies for General Linguistics, Phonetics and Computational Linguistics (first and second cycle) and Co-ordinating Director of Studies for Linguistics and Sign Language at the Department of Linguistics (currently on leave). Researcher in Computational Linguistics.



Recent publications
Sofia Strömbergsson, Jana Götze, Jens Edlund, and Kristina Nilsson Björkenstam (2021). "Simulating Speech Error Patterns Across Languages and Different Datasets". Language and Speech. Online first (2021-02-26)
Kristina Nilsson Björkenstam and Gintare Grigonyte (2020).`I Know Words, I have the Best Words: Repetitions, Parallelisms, and Matters of (In)Coherence´, in Schneider, U. and Eitelmann, M, (eds) Linguistic Inquiries into Donald Trump’s Language From 'Fake News' to 'Tremendous Success'. London: Bloomsbury Academic.


A selection from Stockholm University publication database
  • 2020. Kristina Nilsson Björkenstam, Grigonyte Gintare. Linguistic Inquiries into Donald Trump’s Language, 41-61
  • 2019. Carla Wikse Barrow, Kristina Nilsson Björkenstam, Sofia Strömbergsson. Journal of Child Language 46 (2), 199-213

    This study aimed to investigate concerns of validity and reliability in subjective ratings of age-of-acquisition (AoA), through exploring characteristics of the individual rater. An additional aim was to validate the obtained AoA ratings against two corpora – one of child speech and one of adult speech – specifically exploring whether words over-represented in the child-speech corpus are rated with lower AoA than words characteristic of the adult-speech corpus. The results show that less than one-third of participating informants’ ratings are valid and reliable. However, individuals with high familiarity with preschool-aged children provide more valid and reliable ratings, compared to individuals who do not work with or have children of their own. The results further show a significant, age-adjacent difference in rated AoA for words from the two different corpora, thus strengthening their validity. The study provides AoA data, of high specificity, for 100 child-specific and 100 adult-specific Swedish words.

  • 2018. Paul Ibbotson, Rose M. Hartman, Kristina Nilsson Björkenstam. Language, Cognition and Neuroscience 33 (6), 1-15

    We present an open-access analytic tool, which allows researchers to simultaneously control for and combine language data from the child, the caregiver, multiple languages, and across multiple time points to make inferences about the social and cognitive factors driving the shape of language development. We demonstrate how the tool works in three domains of language learning and across six languages. The results demonstrate the usefulness of this approach as well as providing deeper insight into three areas of language production and acquisition: egocentric language use, the learnability of nouns versus verbs, and imageability. We have made the Frequency Filter tool freely available as an R-package for other researchers to use at

  • 2017. Sofia Strömbergsson (et al.). Proceedings of the 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017), Stockholm: The International Speech Communication Association (ISCA), 2017., 2214-2217

    Child-directed spoken data is the ideal source of support for claims about children’s linguistic environments. However, phonological transcriptions of child-directed speech are scarce,compared to sources like adult-directed speech or text data. Acquiring reliable descriptions of children’s phonological environments from more readily accessible sources would mean considerable savings of time and money. The first step towards this goal is to quantify the reliability of descriptions derived from such secondary sources. We investigate how phonological distributions vary across different modalities (spoken vs. written), and across the age of the intended audience (children vs. adults). Using a previously unseen collection of Swedish adult- and child-directed spoken and written data, we combine lexicon look-up and grapheme-to-phonemeconversion to approximate phonological characteristics. The analysis shows distributional differences across datasets both for single phonemes and for longer phoneme sequences. Some of these are predictably attributed to lexical and contextual characteristics of text vs. speech.The generated phonological transcriptions are remarkably reliable. The differences in phonological distributions between child-directed speech and secondary sources highlight a need for compensatory measures when relying on written data or onadult-directed spoken data, and/or for continued collection ofactual child-directed speech in research on children’s language environments.

  • 2017. Kristina Nilsson Björkenstam, Gintaré Grigonyté. Språktidningen (2), 24-27
  • 2016. Kristina Nilsson Björkenstam, Mats Wirén, Robert Östling. The 54th Annual Meeting of the Association for Computational Linguistics, 82-90

    How do infants learn the meanings of their first words? This study investigates the informativeness and temporal dynamics of non-verbal cues that signal the speaker's referent in a model of early word–referent mapping. To measure the information provided by such cues, a supervised classifier is trained on information extracted from a multimodally annotated corpus of 18 videos of parent–child interaction with three children aged 7 to 33 months. Contradicting previous research, we find that gaze is the single most informative cue, and we show that this finding can be attributed to our fine-grained temporal annotation. We also find that offsetting the timing of the non-verbal cues reduces accuracy, especially if the offset is negative. This is in line with previous research, and suggests that synchrony between verbal and non-verbal cues is important if they are to be perceived as causally related.

  • 2014. Kristina Nilsson Björkenstam, Sofia Gustafson Capková, Mats Wirén. Strindberg on International Stages/Strindberg in Translation

    We have approached the works of August Strindberg from  a computational linguistic point of view, resulting in The Stockholm University Strindberg Corpus, consisting of seven of Strindberg's autobiographical works with linguistic annotation. The corpus is freely available for research. We use this corpus for three quantitative studies of Strindberg’s work: in the first, we describe the novels included in the corpus by keywords; in the second, we compare Strindberg’s use of emotionally charged words with selected prose of both his contemporaries and present-day authors; in the third, we explore the semantic prosody of KVINNA (“woman”) and MAN (“man”).

  • Article SUC-CORE
    2013. Kristina Nilsson Björkenstam. Northern European Journal of Language Technology (NEJLT) 3 (2), 19-39

    This paper describes SUC-CORE, a subset of the Stockholm Umeå Corpus and the Swedish Treebank annotated with noun phrase coreference. While most coreference annotated corpora consist of texts of similar types within related domains, SUC-CORE consists of both informative and imaginative prose and covers a wide range of literary genres and domains.This allows for exploration of coreference across different text types, but it also means that there are limited amounts of data within each type. Future work on coreference resolution for Swedish should include making more annotated data available for the research community.

Show all publications by Kristina Nilsson Björkenstam at Stockholm University

Last updated: March 1, 2021

Bookmark and share Tell a friend