Denna sida på svenska

Research project Structured Multilinguality for Natural Language Processing

How do we go from dealing with one or a few languages in modern language technology, to thousands of languages? How can this be done using insights and experience from language researchers, instead of just adding more data and more computing power? These are the questions investigated in this project.

Mostphotos

Modern language processing tools, like Google Translate, the spell checker on your phone, and many other things you never thought about need to handle an increasing number of languages. There are around 7000 languages in the world, but the vast majority of these are never considered when building tools for natural language processing. However, these 7000 languages are not all completely different, but many share a common ancestry and others have borrowed from each other during history. Just like it is easier for a Swedish-speaker to learn German than it is to learn Japanese, we want to investigate new methods to make computers able to learn how to work not just with many languages, but many languages that share words and grammar in diverse and complex ways.

Project description

The goal of this project is to investigate new methods for using structured information about language variation in order to improve multilingual natural language processing models. The first part of the project is aimed at basic research on how to represent linguistic information about languages in a way that is maximally useful to neural models, while the second part of the project is concerned with applying these representations to practical tasks in natural language processing, such as machine translation and morphosyntactic analysis.

Project members

Project managers

Robert Mikael Östling

Docent

Department of Linguistics

robert@ling.su.se

Members

Isabelle Augenstein

Associate Professor

Department of Computer Science, University of Copenhagen

augenstein@di.ku.dk

Johannes Bjerva

Associate Professor

Department of Computer Science, Aalborg University

jbjerva@cs.aau.dk