Research project Structured Multilinguality for Natural Language Processing
How do we go from dealing with one or a few languages in modern language technology, to thousands of languages? How can this be done using insights and experience from language researchers, instead of just adding more data and more computing power? These are the questions investigated in this project.
Modern language processing tools, like Google Translate, the spell checker on your phone, and many other things you never thought about need to handle an increasing number of languages. There are around 7000 languages in the world, but the vast majority of these are never considered when building tools for natural language processing. However, these 7000 languages are not all completely different, but many share a common ancestry and others have borrowed from each other during history. Just like it is easier for a Swedish-speaker to learn German than it is to learn Japanese, we want to investigate new methods to make computers able to learn how to work not just with many languages, but many languages that share words and grammar in diverse and complex ways.
Project description
The goal of this project is to investigate new methods for using structured information about language variation in order to improve multilingual natural language processing models. The first part of the project is aimed at basic research on how to represent linguistic information about languages in a way that is maximally useful to neural models, while the second part of the project is concerned with applying these representations to practical tasks in natural language processing, such as machine translation and morphosyntactic analysis.
Project members
Project managers
Robert Mikael Östling
Docent
Members
Isabelle Augenstein
Associate Professor
Johannes Bjerva
Associate Professor