Computational linguistics – tools

Here you will find natural language processing tools developed by our Computational linguistics staff.

To see our corpora and other resourcestools, please turn to this page:

Computational linguistics – corpora and resources

 

Tools

The current tools we distribute include the word aligner eflomal and the PoS-tagger and named entity recognizer efselab. Older tools can be found further down on the page.

eflomal is a highly efficient word alignment tool. It is freely available via GitHub:

To eflomal github.com

Publications

Technical details can be found in the following article:

Efficient Word Alignment with Markov Chain Monte Carlo (ufal.mff.cuni.cz/pbml)
(Robert Östling and Jörg Tiedemann, 2016)

efselab is a compiler for sequence labeling tools, aimed at producing accurate and very fast part-of-speech (PoS) taggers and named entity recognizers (NER).

It is freely available via GitHub.

To efselab (github.com)

Publications

A detailed description of the algorithms used along with evaluations can be found in the following paper:

Part of speech tagging: Shallow or deep learning? (nejlt.ep.liu.se)
(Robert Östling, 2018)

 

Legacy NLP tools

Some of our older tools are still in use. You will find them below.

Stockholm TreeAligner is a tool for aligning and searching parallel treebanks. This tool allows you to create alignment links between corresponding nodes (or words) in two treebanks in different languages.

Stockholm TreeAligner was a collaboration project between the Computational Linguistics Groups of Stockholm University and the University of Zürich. 

Explore Stockholm TreeAligner at University of Zurich (cl.uzh.ch)

Stagger is a Swedish part-of-speech tagger. Stagger has now been replaced by efselab (see above) but is still available on GitHub:

Stagger on GitHub

On this page

mainArticlePageLayout

{
  "dimensions": [
    {
      "id": "department.categorydimension.subject",
      "name": "Global categories",
      "enumerable": true,
      "entities": [],
      "localizations": {}
    },
    {
      "id": "department.categorydimension.tag.Keywords",
      "name": "Keywords",
      "enumerable": false,
      "entities": [],
      "localizations": {}
    },
    {
      "id": "department.categorydimension.tag.Person",
      "name": "Person",
      "enumerable": false,
      "entities": [],
      "localizations": {}
    },
    {
      "id": "department.categorydimension.tag.Tag",
      "name": "Tag",
      "enumerable": false,
      "entities": [],
      "localizations": {}
    },
    {
      "id": "lingvistik.eng.lokala.kat",
      "name": "Lokala kategorier ENG Ling",
      "enumerable": true,
      "entities": [],
      "localizations": {}
    },
    {
      "id": "webb2021.categorydimension.Keyword",
      "name": "Keywords (Webb 2021)",
      "enumerable": false,
      "entities": [],
      "localizations": {}
    }
  ]
}