Murathan Kurfali Postdoktor





  • Behind the book reviews

    Artikel
    2025. Henrik Fürst, Murathan Kurfali.

    Book reviews are instrumental in assessing the public value of literary works. However, the path to literary recognition, considering the factors influencing why certain books are reviewed, earn acclaim, or face criticism, remains unclear. Amid ongoing discussions around the “crisis of criticism,” the study scrutinizes this discourse and employs two regression analyses to examine the occurrence and sentiment of reviews for 9814 fiction books originally in Swedish and 8340 reviews from major Swedish newspapers (2001–2018). The sentiment of book reviews is determined through a novel approach with an automated pipeline incorporating state-of-the-art natural language processing models. The article reveals enduring institutional, cultural, and demographic hierarchies that shape literary recognition through the occurrence of reviews. Key factors include literary prestige and author youth, alongside ties to elite literary networks, publication by major houses, and focus on literary fiction or poetry. Conversely, older authors with numerous books, bestsellers, or a focus on children’s literature exhibit reduced review probabilities. While review occurrence is highly predictable, sentiment proves less so. Moreover, reviews generally express positivity despite frequent reviewer discord. Our findings suggest that books are selected for review based on external factors such as reputation and credentials, thereby reflecting persistent patterns of cultural consecration. This resonates with the idea of a “conservative revolution”, where established hierarchies maintain their influence, rather than indicating a “crisis in criticism”. Nevertheless, the unpredictability of review sentiment and the lack of consistent consensus on quality underscore a deeper evaluative uncertainty that transcends the more stable hierarchies governing review selection.

    Läs mer om Behind the book reviews
  • Chemosensory vocabulary in wine, perfume and food product reviews

    Artikel
    2025. Thomas Hörberg, Murathan Kurfali, Jonas K. Olofsson.

    Chemosensory sensations are often hard to describe and quantify. Language models may facilitate a systematic understanding of sensory descriptions. We accessed consumer and expert reviews of wine, perfume, and food products (English language; about 68 million words in total) and analyzed their sensory descriptions. Using a novel data-driven method based on natural language data, we compared the three chemosensory vocabularies (wine, perfume, food) with respect to their vocabulary overlap and semantic properties, and explored their semantic spaces. The three vocabularies primarily differ with respect to domain specificity, concreteness, descriptor type preference and degree of gustatory vs. olfactory association. Wine vocabulary primarily distinguishes between white wine and red wine flavors and qualities. Food vocabulary separates drinkable and edible food products and ingredients, on the one hand, and savory and non-savory products, on the other. A salient distinction in all three vocabularies is between concrete and abstract/evaluative terms. Valence also plays a role in the semantic spaces of all three vocabularies, but valence is less prominent here than in general olfactory vocabulary. Our method allows a systematic comparison of sensory descriptors in the three product domains and provides a data-driven approach to derive sensory lexicons that can be applied by sensory scientists.

    Läs mer om Chemosensory vocabulary in wine, perfume and food product reviews
  • LLM-based post-editing as reference-free GEC evaluation

    Konferens
    2025. Robert Östling, Murathan Kurfalı, Andrew Caines.

    Evaluation of Grammatical Error Correction (GEC) systems is becoming increasingly challenging as the quality of such systems increases and traditional automatic metrics fail to adequately capture such nuances as fluency versus minimal edits, alternative valid corrections compared to the ‘ground truth’, and the difference between corrections that are useful in a language learning scenario versus those preferred by native readers. Previous work has suggested using human post-editing of GEC system outputs, but this is very labor-intensive. We investigate the use of Large Language Models (LLMs) as post-editors of English and Swedish texts, and perform a meta-analysis of a range of different evaluation setups using a set of recent GEC systems. We find that for the two languages studied in our work, automatic evaluation based on post-editing agrees well with both human post-editing and direct human rating of GEC systems. Furthermore, we find that a simple n-gram overlap metric is sufficient to measure post-editing distance, and that including human references when prompting the LLMs generally does not improve agreement with human ratings. The resulting evaluation metric is reference-free and requires no language-specific training or additional resources beyond an LLM capable of handling the given language.Evaluation of Grammatical Error Correction (GEC) systems is becoming increasingly challenging as the quality of such systems increases and traditional automatic metrics fail to adequately capture such nuances as fluency versus minimal edits, alternative valid corrections compared to the ‘ground truth’, and the difference between corrections that are useful in a language learning scenario versus those preferred by native readers. Previous work has suggested using human post-editing of GEC system outputs, but this is very labor-intensive. We investigate the use of Large Language Models (LLMs) as post-editors of English and Swedish texts, and perform a meta-analysis of a range of different evaluation setups using a set of recent GEC systems. We find that for the two languages studied in our work, automatic evaluation based on post-editing agrees well with both human post-editing and direct human rating of GEC systems. Furthermore, we find that a simple n-gram overlap metric is sufficient to measure post-editing distance, and that including human references when prompting the LLMs generally does not improve agreement with human ratings. The resulting evaluation metric is reference-free and requires no language-specific training or additional resources beyond an LLM capable of handling the given language.

    Läs mer om LLM-based post-editing as reference-free GEC evaluation
  • Representations of smells

    Artikel
    2025. Murathan Kurfalı, Pawel Herman, Stephen Pierzchajlo, Jonas K. Olofsson, Thomas Hörberg.

    Whereas human cognition develops through perceptually driven interactions with the environment, language models (LMs) are “disembodied learners” which might limit their usefulness as model systems. We evaluate the ability of LMs to recover sensory information from natural language, addressing a significant gap in cognitive science research literature. Our investigation is carried out through the sense of smell — olfaction — because it is severely underrepresented in natural language and thus poses a unique challenge for linguistic and cognitive modeling. By systematically evaluating the ability of three generations of LMs, including static word embedding models (Word2Vec, FastText), encoder-based models (BERT), and the decoder-based large LMs (LLMs; GPT-4o, Llama 3.1 among others), under nearly 200 training configurations, we investigate their proficiency in acquiring information to approximate human odor perception from textual data. As benchmarks for the performance of the LMs, we use three diverse experimental odor datasets including odor similarity ratings, imagined similarities of odor pairings from word labels, and odor-to-label ratings. The results reveal the possibility for LMs to accurately represent olfactory information, and describe the conditions under which this possibility is realized. Static, simpler models perform best in capturing odor-perceptual similarities under certain training configurations, while GPT-4o excels in simulating olfactory-semantic relationships, as suggested by its superior performance on datasets where the collected odor similarities are derived from word-based assessments. Our findings show that natural language encodes latent information regarding human olfactory information that is retrievable through text-based LMs to varying degrees. Our research shows promise for LMs to be useful tools in investigating the long debated relation between symbolic representations and perceptual experience in cognitive science.

    Läs mer om Representations of smells
  • Towards better language representation in Natural Language Processing

    Artikel
    2025. Arianna Masciolini, Andrew Caines, Orphée De Clercq, Joni Kruijsbergen, Murathan Kurfalı, Ricardo Muñoz Sánchez, Elena Volodina, Robert Östling, Kais Allkivif, Špela Arhar Holdt, Ilze Auzina, Roberts Darģis, Elena Drakonaki, Jennifer-Carmen Frey, Isidora Glišić, Pinelopi Kikilintza, Lionel Nicolas, Mariana Romanyshyn, Alexandr Rosen, Alla Rozovskaya, Kristjan Suluste, Oleksiy Syvokon, Alexandros Tantos, Despoina-Ourania Touriki, Konstantinos Tsiotskas, Eleni Tsourilla, Vassilis Varsamopoulos, Katrin Wisniewski, Aleš Žagar, Torsten Zesch.

    This paper introduces MultiGEC, a dataset for multilingual Grammatical Error Correction (GEC) in twelve European languages: Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian. MultiGEC distinguishes itself from previous GEC datasets in that it covers several underrepresented languages, which we argue should be included in resources used to train models for Natural Language Processing tasks which, as GEC itself, have implications for Learner Corpus Research and Second Language Acquisition. Aside from multilingualism, the novelty of the MultiGEC dataset is that it consists of full texts — typically learner essays — rather than individual sentences, making it possible to train systems that take a broader context into account. The dataset was built for MultiGEC-2025, the first shared task in multilingual text-level GEC, but it remains accessible after its competitive phase, serving as a resource to train new error correction systems and perform cross-lingual GEC studies.

    Läs mer om Towards better language representation in Natural Language Processing

Doftande AI? Integrering av lukt i stora språkmodeller

Detta forskningsprojekt syftar till att utveckla avancerade språkmodeller baserade på artificiell intelligens (AI) så att de representerar sensoriska upplevelser, såsom lukter. Genom att fokusera på luktsinnet, ofta beskrivet som "det tysta sinnet" på grund av dess begränsade representation i språket, ämnar projektet att skapa den första “doftberikade” språkmodellen.