Research project Scented AI? Integrating olfaction into Large Language Models

The relationship between language and sensory knowledge is a crucial question in the cognitive sciences.With advancements in Large Language Models (LLMs), this research explores how language can form sensory knowledge without direct experiences.

The relationship between language and sensory knowledge is a crucial question in the cognitive sciences. With advancements in Large Language Models (LLMs), this research explores how language can form sensory knowledge without direct experiences.

It aims to empirically test and enhance LLMs' ability to represent sensory experiences, focusing on olfaction, often termed the 'muted sense' due to its limited language portrayal. By examining the capabilities of LLMs to develop consistent representations of olfactory concepts that align with human interpretations, this project addresses a critical gap in the development of multimodal AI systems.

Objectives include creating a comprehensive dataset reflecting human olfactory perception and developing the first olfaction-enriched open-source language model using this dataset and advanced fine-tuning techniques. Finally, the first benchmark to systematically evaluate LLMs' proficiency in olfactory processing will be developedto keep track of and facilitate further research.

The significance of this research lies in its potential to advance our understanding of the representation ofsensory experiences in language. The model will serve both theoretical and practical benefits, such as improvingdiagnostic methods in healthcare and advancing product development in the perfume and flavour industries.

This project explores how far language-based learning can go in supporting meaningful representations of sensory concepts. In cognitive science, there is active debate about the degree to which sensory understanding depends on direct perceptual experience versus knowledge gained through linguistic description. Rather than arguing for one side, this project uses modern Large Language Models (LLMs) as a practical testbed to investigate what kinds of sensory structure can emerge from language and how closely that structure aligns with human judgments.

The project focuses on olfaction (smell), a particularly challenging domain because odors are often considered difficult to describe precisely in words—sometimes referred to as the “muted sense.” That difficulty makes olfaction a useful case study for examining what information is captured in language about sensory experience, and what may be missing.

The aim is to build tools and evidence that can clarify these questions. Concretely, the project plans to develop the first open-source, olfaction-enriched language model and the supporting resources needed to train and evaluate it. This includes: (1) testing whether existing LLMs encode olfactory relationships in ways that reflect human interpretations; (2) collecting and integrating new human odor-description data and mined olfactory narratives from large text corpora, alongside relevant chemical odor representations, to fine-tune an open-source model (including exploring advanced approaches such as mixture-of-experts); and (3) creating the first dedicated benchmark for measuring “olfactory competence” in language models. Together, these outputs are intended to support more rigorous research on language and sensory meaning, while also enabling downstream applications such as text-based screening of olfactory dysfunction in healthcare and improved analysis of scent-related consumer language in fragrance and flavor development.

This research project has no members.

Kurfalı, M., Herman, P., Pierzchajlo, S., Olofsson, J., & Hörberg, T. (2025). Representations of smells: The next frontier for language models? Cognition, 264, 106243.

No news items available.
No events available.