Computational linguistics – tools

Here you will find natural language processing tools developed by our Computational linguistics staff.

To see our corpora and other resourcestools, please turn to this page:

Computational linguistics – corpora and resources

Tools

The current tools we distribute include the word aligner eflomal and the PoS-tagger and named entity recognizer efselab. Older tools can be found further down on the page.

Efficient Low-Memory Aligner (eflomal)

eflomal is a highly efficient word alignment tool. It is freely available via GitHub:

To eflomal github.com

Publications

Technical details can be found in the following article:

Efficient Word Alignment with Markov Chain Monte Carlo (ufal.mff.cuni.cz/pbml)
(Robert Östling and Jörg Tiedemann, 2016)

Efficient Sequence Labeling (efselab)

efselab is a compiler for sequence labeling tools, aimed at producing accurate and very fast part-of-speech (PoS) taggers and named entity recognizers (NER).

It is freely available via GitHub.

To efselab (github.com)

Publications

A detailed description of the algorithms used along with evaluations can be found in the following paper:

Part of speech tagging: Shallow or deep learning? (nejlt.ep.liu.se)
(Robert Östling, 2018)

Legacy NLP tools

Some of our older tools are still in use. You will find them below.

Stockholm TreeAligner

Stockholm TreeAligner is a tool for aligning and searching parallel treebanks. This tool allows you to create alignment links between corresponding nodes (or words) in two treebanks in different languages.

Stockholm TreeAligner was a collaboration project between the Computational Linguistics Groups of Stockholm University and the University of Zürich.

Explore Stockholm TreeAligner at University of Zurich (cl.uzh.ch)

The Stockholm Tagger (Stagger)

Stagger is a Swedish part-of-speech tagger. Stagger has now been replaced by efselab (see above) but is still available on GitHub:

Stagger on GitHub

Last updated: January 21, 2025

Source: Department of Linguistics