Clinical text mining tools
We have, within the Natural Language Processing Research Group, developed a set of tools and lexical resources for Swedish.
Here’s a list of some of the clinical text mining tools and resources that we have developed over the years.
Annotated training data
- Annotated clinical text in Swedish for training and evaluation of machine learning tools:
Annoteringar gjorda 2008–2023, ver30.pdf pdf, 274 kB. (273 Kb)
The annotated data is clinical text written in Swedish. It has been annotated the years 2008–2023. The descriptions are in Swedish, but can be understood since the annotation classes are in English and there are numerical descriptions.
For more information, contact the author Hercules Dalianis
Tools
- Health Bank Deid - HB Deid - de-identification and pseudonymisation tool for Swedish text
- EasyICD
ICD-10 diagnosis code assignment tool - Medical Decompounder for Swedish zip, 164.3 kB. (164 Kb)
Petter Andersson and Amanda Sjöberg, 2016:
Generating and evaluating an automatic mapping between SNOMED-CT and the Swedish extension codes of ICD-10 based on lexical similarities
Master thesis, DSV, Stockholm University. - DrugView Tools
DrugView Demo is a tool that presents and explores drug-pairs from Swedish Electronic Records from the years 2009–2010.
Other resources
- A de-identified clinical Swedish Language Model SweDeClin-BERT.
Contact Hercules Dalianis for access
Vakili, T., Lamproudis, A., Henriksson, A. and H. Dalianis, 2022:
Downstream Task Performance of BERT Models Pre-Trained Using Automatically De-Identified Clinical Data
In Proceedings of the 13th International Conference on Language Resources and Evaluation, LREC 2022, Marseille, France, pp. 4245–4252. - Swedish ICD-10 diagnosis codes in text format
- Swedish negation triggers for NegEx
- Swedish ADE word lists zip, 10.1 kB. (10 Kb)
Freidrich, S. and H. Dalianis, 2015:
Adverse Drug Event classification of health records using dictionary based pre-processing and machine learning
In the proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis, Louhi, held in conjunction with EMNLP 2015, Lisbon, Portugal, pp 121-130. - Swedish Medical Abbreviations zip, 15.1 kB. (15 Kb)
Kvist, M., and S. Velupillai, 2014:
SCAN: A Swedish Clinical Abbreviation Normalizer
In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 62-73). Springer International Publishing
Master theses and reports
- Lotta Kiefer, 2024:
Instruction-Tuning LLaMA for Synthetic Medical Note Generation: Bridging Data Privacy and Utility in Downstream Tasks.
Master thesis, Saarland University. (Hercules Dalianis was the supervisor). - Sonja Remmer, 2021:
Automatic Diagnosis Code Assignment with KB-BERT. ICD Classification Using Swedish Discharge Summaries.
Master Thesis, Stockholm University. - Synnøve Bråten, 2020:
Extending a Synthetic Norwegian Clinical Corpus for De-Identification.
Master Thesis, Stockholm University/Karolinska Institutet. Norwegian clinical synthetical corpus, NorSynthClinical-PHI, GitHub
Last updated: 2025-12-11
Source: Department of Computer and Systems Sciences, DSV