Stockholm university

Anna Persson

About me

I am a lecturer of Swedish as a Second Language at the Department of Swedish Language and Multilingualism. In November 2024, I defended my thesis in Scandinavian Languages on pre-linguistic normalization and vowel perception. My research interests broadly concern listeners' processing of spoken language, more specifically, how the brain deals with the fact that we all vary in our pronunciations. I investigate the underlying cognitive mechanisms that enable stable speech perception across talkers, using a combination of acoustic analysis, perception experiments and computational models.

I have a background teaching Swedish as a second language for high school students and adult students. I have also worked with test item writing, test validation and assessment for Tisus (Test in Swedish for university studies).

Teaching

  • Introduction to psycholinguistics
  • Language assessment for teachers of Swedish as a second language
  • Qualifying course in Swedish for university studies

Research

As talkers, we all differ in our pronunciations, resulting in cross-talker differences in the mapping between acoustic cues and linguistic categories and meanings. From previous work, we know that listeners have a remarkable ability to rapidly adapt to the pronunciations of an unfamiliar talker, leading to stable cross-talker perception. What is less known is the specific mechanism(s) underlying this adaptive ability. A long-standing hypothesis in the literature is that listeners achieve stable cross-talker perception by normalizing the acoustic signal for talker-specific characteristics, related to anatomical differences in talker physiology (e.g., vocal tract length). Numerous accounts of pre-linguistic normalization have been proposed over the years. Widely used in variationist sociolinguistics, sociophonetics, and dialectology, accounts have often been compared and evaluated on how well they reduce category variability in vowel spaces. Less is known about their relative plausibility as models of human speech perception - how well they can explain what humans actually do. In my thesis, I investigate the predicted consequences of vowel normalization for stable cross-talker perception, using Swedish and English vowels. I approach this question by acoustic analysis, computational models and vowel perception experiments. In addition, I report on the static and dynamic acoustic characteristics of the modern-day Central Swedish vowel space.

Research projects

Publications

A selection from Stockholm University publication database

  • Evaluating normalization accounts against the dense vowel space of Central Swedish

    2023. Anna Persson, T. Florian Jaeger. Frontiers in Psychology 14

    Article

    Talkers vary in the phonetic realization of their vowels. One influential hypothesis holds that listeners overcome this inter-talker variability through pre-linguistic auditory mechanisms that normalize the acoustic or phonetic cues that form the input to speech recognition. Dozens of competing normalization accounts exist-including both accounts specific to vowel perception and general purpose accounts that can be applied to any type of cue. We add to the cross-linguistic literature on this matter by comparing normalization accounts against a new phonetically annotated vowel database of Swedish, a language with a particularly dense vowel inventory of 21 vowels differing in quality and quantity. We evaluate normalization accounts on how they differ in predicted consequences for perception. The results indicate that the best performing accounts either center or standardize formants by talker. The study also suggests that general purpose accounts perform as well as vowel-specific accounts, and that vowel normalization operates in both temporal and spectral domains.

    Read more about Evaluating normalization accounts against the dense vowel space of Central Swedish
  • Comparing theories of pre-linguistic normalization for vowel perception

    2024. Anna Persson.

    Thesis (Doc)

    The present thesis compares competing theories of pre-linguistic normalization for the perception of Swedish and English vowels. Specifically, the overall aim is to investigate whether normalization might be key to understanding the mechanisms supporting robust cross-talker perception, and to gain more insights into the specific computations involved. The thesis is based on three articles that employ acoustic analysis, behavioral experiments and computational modeling to address the question of vowel normalization.

    Article I uses a novel phonetically annotated database of Swedish vowel recordings, the SwehVd, to provide an updated acoustic description of the Central Swedish vowel system and to evaluate certain claims of cue-to-category mappings introduced by previous work. Replicating previous studies, the results of Article I suggest that F1, F2 and vowel duration are the most important cues to vowel identity in Central Swedish. In addition, the results highlight the importance of formant dynamics for reliable category distinctions. The acoustic characteristics of Article I further constitute the input to the computational modeling presented in Article II.

    Article II evaluates 15 competing normalization accounts in terms of how well they predict the intended vowel category of Central Swedish, as represented by the talkers in SwehVd. Specifically, a computational model of vowel perception, a Bayesian ideal observer, is used to assess the predicted consequences of normalization. The results indicate that normalization accounts that assume the learning and storing of talker-specific acoustics (i.e., extrinsic accounts) achieve the best fit against vowel production data. The evaluation against the SwehVd database further contributes to the insight that languages with dense vowel spaces do not necessarily require more complex normalization mechanisms.

    Article III evaluates 20 different normalization accounts in how well they predict listeners' categorization behavior in two vowel categorization experiments on US English vowels. Paralleling the results from Article II, the results indicate that more complex extrinsic normalization is needed for robust cross-talker perception. However, it is a computationally minimalist extrinsic account – uniform scaling – that provides the best fit when evaluated against listeners' responses. This would seem to suggest that more complex computations (as in, e.g., Lobanov normalization) are not required for human speech perception.

    The thesis aimed for a broad-scale evaluation of competing theories of pre-linguistic normalization, assessing the predictions of different accounts using different types of experiment stimuli, different vowel spaces, and different sets of acoustic cues. This broad-scale evaluation was made possible through the implementation of a holistic and stringent computational framework, for an unbiased comparison of accounts. The main contributions of this thesis include the open-access publication of the framework and the vowel database, to facilitate replication and future studies.

    Read more about Comparing theories of pre-linguistic normalization for vowel perception
  • The acoustic characteristics of Swedish vowels

    2025. Anna Persson. Phonetica 81 (6), 599-643

    Article

    The Swedish vowel space is relatively densely populated with 21 categories that differ in quality and quantity. Existing descriptions of the entire space rest on recordings made in the late 1990s or earlier, while recent work in general has focused on subsets of the space. The present paper reports on static and dynamic acoustic analyses of the entire vowel space using a recently released database of h-VOWEL-d words (SwehVd). The results highlight the importance of static and dynamic spectral and temporal cues for Swedish vowel category distinction. The first two formants and vowel duration are the primary acoustic cues to vowel identity, however, the third formant contributes to increased category separability for neighboring contrasts presumed to differ in lip-rounding. In addition, even though all long-short vowel pairs differ systematically in duration, they also display considerable spectral differences, suggesting that quantity distinctions are not separate from quality distinctions in Swedish. The dynamic analysis further suggests formant movements in both long and short vowels, with [e:] and [o:] displaying clearer patterns of diphthongization.

    Read more about The acoustic characteristics of Swedish vowels

Show all publications by Anna Persson at Stockholm University

$presentationText

profilePageLayout