Stockholms universitet

Maria SkeppstedtLektor

Publikationer

I urval från Stockholms universitets publikationsdatabas

  • Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study

    2014. Maria Skeppstedt (et al.). Journal of Biomedical Informatics 49, 148-158

    Artikel

    Automatic recognition of clinical entities in the narrative text of health records is useful for constructing applications for documentation of patient care, as well as for secondary usage in the form of medical knowledge extraction. There are a number of named entity recognition studies on English clinical text, but less work has been carried out on clinical text in other languages. This study was performed on Swedish health records, and focused on four entities that are highly relevant for constructing a patient overview and for medical hypothesis generation, namely the entities: Disorder, Finding, Pharmaceutical Drug and Body Structure. The study had two aims: to explore how well named entity recognition methods previously applied to English clinical text perform on similar texts written in Swedish; and to evaluate whether it is meaningful to divide the more general category Medical Problem, which has been used in a number of previous studies, into the two more granular entities, Disorder and Finding. Clinical notes from a Swedish internal medicine emergency unit were annotated for the four selected entity categories, and the inter-annotator agreement between two pairs of annotators was measured, resulting in an average F-score of 0.79 for Disorder, 0.66 for Finding, 0.90 for Pharmaceutical Drug and 0.80 for Body Structure. A subset of the developed corpus was thereafter used for finding suitable features for training a conditional random fields model. Finally, a new model was trained on this subset, using the best features and settings, and its ability to generalise to held-out data was evaluated. This final model obtained an F-score of 0.81 for Disorder, 0.69 for Finding, 0.88 for Pharmaceutical Drug, 0.85 for Body Structure and 0.78 for the combined category Disorder + Finding. The obtained results, which are in line with or slightly lower than those for similar studies on English clinical text, many of them conducted using a larger training data set, show that the approaches used for English are also suitable for Swedish clinical text. However, a small proportion of the errors made by the model are less likely to occur in English text, showing that results might be improved by further tailoring the system to clinical Swedish. The entity recognition results for the individual entities Disorder and Finding show that it is meaningful to separate the general category Medical Problem into these two more granular entity types, e.g. for knowledge mining of co-morbidity relations and disorder-finding relations.

    Läs mer om Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text
  • Cue-based assertion classification for Swedish clinical text-Developing a lexicon for pyConTextSwe

    2014. Sumithra Velupillai (et al.). Artificial Intelligence in Medicine 61 (3), 137-144

    Artikel

    Objective: The ability of a cue-based system to accurately assert whether a disorder is affirmed, negated, or uncertain is dependent, in part, on its cue lexicon. In this paper, we continue our study of porting an assertion system (pyConTextNLP) from English to Swedish (pyConTextSwe) by creating an optimized assertion lexicon for clinical Swedish. Methods and material: We integrated cues from four external lexicons, along with generated inflections and combinations. We used subsets of a clinical corpus in Swedish. We applied four assertion classes (definite existence, probable existence, probable negated existence and definite negated existence) and two binary classes (existence yes/no and uncertainty yes/no) to pyConTextSwe. We compared pyConTextSwe's performance with and without the added cues on a development set, and improved the lexicon further after an error analysis. On a separate evaluation set, we calculated the system's final performance. Results: Following integration steps, we added 454 cues to pyConTextSwe. The optimized lexicon developed after an error analysis resulted in statistically significant improvements on the development set (83%F-score, overall). The system's final F-scores on an evaluation set were 81% (overall). For the individual assertion classes, F-score results were 88% (definite existence), 81% (probable existence), 55% (probable negated existence), and 63% (definite negated existence). For the binary classifications existence yes/no and uncertainty yes/no, final system performance was 97%/87% and 78%/86% F-score, respectively. Conclusions: We have successfully ported pyConTextNLP to Swedish (pyConTextSwe). We have created an extensive and useful assertion lexicon for Swedish clinical text, which could form a valuable resource for similar studies, and which is publicly available.

    Läs mer om Cue-based assertion classification for Swedish clinical text-Developing a lexicon for pyConTextSwe
  • Extracting Clinical Findings from Swedish Health Record Text

    2014. Maria Skeppstedt.

    Avhandling (Dok)

    Information contained in the free text of health records is useful for the immediate care of patients as well as for medical knowledge creation. Advances in clinical language processing have made it possible to automatically extract this information, but most research has, until recently, been conducted on clinical text written in English. In this thesis, however, information extraction from Swedish clinical corpora is explored, particularly focusing on the extraction of clinical findings. Unlike most previous studies, Clinical Finding was divided into the two more granular sub-categories Finding (symptom/result of a medical examination) and Disorder (condition with an underlying pathological process). For detecting clinical findings mentioned in Swedish health record text, a machine learning model, trained on a corpus of manually annotated text, achieved results in line with the obtained inter-annotator agreement figures. The machine learning approach clearly outperformed an approach based on vocabulary mapping, showing that Swedish medical vocabularies are not extensive enough for the purpose of high-quality information extraction from clinical text. A rule and cue vocabulary-based approach was, however, successful for negation and uncertainty classification of detected clinical findings. Methods for facilitating expansion of medical vocabulary resources are particularly important for Swedish and other languages with less extensive vocabulary resources. The possibility of using distributional semantics, in the form of Random indexing, for semi-automatic vocabulary expansion of medical vocabularies was, therefore, evaluated. Distributional semantics does not require that terms or abbreviations are explicitly defined in the text, and it is, thereby, a method suitable for clinical corpora. Random indexing was shown useful for extending vocabularies with medical terms, as well as for extracting medical synonyms and abbreviation dictionaries.

    Läs mer om Extracting Clinical Findings from Swedish Health Record Text
  • Medical vocabulary mining using distributional semantics on Japanese patient blogs

    2014. Magnus Ahltorp (et al.). Proceedings of the 6th International Symposium on Semantic Mining in Biomedicine, 57-62

    Konferens

    Random indexing has previously been successfully used for medical vocabulary expansion for Germanic languages. In this study, we used this approach to ex- tract medical terms from a Japanese pa- tient blog corpus. The corpus was seg- mented into semantic units by a semantic role labeller, and different pre-processing and parameter settings were then evalu- ated. The evaluation showed that simi- lar settings are suitable for Japanese as for previously explored Germanic languages, and that distributional semantics is equally useful for semi-automatic expansion of Japanese medical vocabularies as for med- ical vocabularies in Germanic languages.

    Läs mer om Medical vocabulary mining using distributional semantics on Japanese patient blogs
  • Synonym extraction and abbreviation expansion with ensembles of semantic spaces

    2014. Aron Henriksson (et al.). Journal of Biomedical Semantics 5 (6)

    Artikel

    Background: Terminologies that account for variation in language use by linking synonyms and abbreviations to their corresponding concept are important enablers of high-quality information extraction from medical texts. Due to the use of specialized sub-languages in the medical domain, manual construction of semantic resources that accurately reflect language use is both costly and challenging, often resulting in low coverage. Although models of distributional semantics applied to large corpora provide a potential means of supporting development of such resources, their ability to isolate synonymy from other semantic relations is limited. Their application in the clinical domain has also only recently begun to be explored. Combining distributional models and applying them to different types of corpora may lead to enhanced performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs. Results: A combination of two distributional models – Random Indexing and Random Permutation – employed in conjunction with a single corpus outperforms using either of the models in isolation. Furthermore, combining semantic spaces induced from different types of corpora – a corpus of clinical text and a corpus of medical journal articles – further improves results, outperforming a combination of semantic spaces induced from a single source, as well as a single semantic space induced from the conjoint corpus. A combination strategy that simply sums the cosine similarity scores of candidate terms is generally the most profitable out of the ones explored. Finally, applying simple post-processing filtering rules yields substantial performance gains on the tasks of extracting abbreviation-expansion pairs, but not synonyms. The best results, measured as recall in a list of ten candidate terms, for the three tasks are: 0.39 for abbreviations to long forms, 0.33 for long forms to abbreviations, and 0.47 for synonyms. Conclusions: This study demonstrates that ensembles of semantic spaces can yield improved performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs. This notion, which merits further exploration, allows different distributional models – with different model parameters – and different types of corpora to be combined, potentially allowing enhanced performance to be obtained on a wide range of natural language processing tasks.

    Läs mer om Synonym extraction and abbreviation expansion with ensembles of semantic spaces
  • Adapting a parser to clinical text by simple pre-processing rules

    2013. Maria Skeppstedt. Proceedings of the 2013 Workshop on Biomedical Natural Language Processing, 98-101

    Konferens

    Sentence types typical to Swedish clinical text were extracted by comparing sentence part-of-speech tag sequences in clinical and in standard Swedish text. Parsings by a syntactic dependency parser, trained on standard Swedish, were manually analysed for the 33 sentence types most typical to clinical text. This analysis resulted in the identification of eight error types, and for two of these error types, pre- processing rules were constructed to improve the performance of the parser. For all but one of the ten sentence types affected by these two rules, the parsing was improved by pre-processing.

    Läs mer om Adapting a parser to clinical text by simple pre-processing rules
  • Annotating named entities in clinical text by combining pre-annotation and active learning

    2013. Maria Skeppstedt. 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), 74-80

    Konferens

    Sentence types typical to Swedish clini- cal text were extracted by comparing sen- tence part-of-speech tag sequences in clin- ical and in standard Swedish text. Parsings by a syntactic dependency parser, trained on standard Swedish, were manually ana- lysed for the 33 sentence types most typ- ical to clinical text. This analysis re- sulted in the identification of eight error types, and for two of these error types, pre- processing rules were constructed to im- prove the performance of the parser. For all but one of the ten sentence types af- fected by these two rules, the parsing was improved by pre-processing.

    Läs mer om Annotating named entities in clinical text by combining pre-annotation and active learning
  • Corpus-Driven Terminology Development: Populating Swedish SNOMED CT with Synonyms Extracted from Electronic Health Records

    2013. Aron Henriksson (et al.). Proceedings of the 2013 Workshop on Biomedical Natural Language Processing (BioNLP 2013), 36-44

    Konferens

    The various ways in which one can refer to the same clinical concept needs to be accounted for in a semantic resource such as SNOMED CT. Developing terminological resources manually is, however, prohibitively expensive and likely to result in low coverage, especially given the high variability of language use in clinical text. To support this process, distributional methods can be employed in conjunction with a large corpus of electronic health records to extract synonym candidates for clinical terms. In this paper, we exemplify the potential of our proposed method using the Swedish version of SNOMED CT, which currently lacks synonyms. A medical expert inspects two thousand term pairs generated by two semantic spaces -- one of which models multiword terms in addition to single words -- for one hundred preferred terms of the semantic types disorder and finding.

    Läs mer om Corpus-Driven Terminology Development
  • Extending the NegEx Lexicon for Multiple Languages

    2013. Wendy W. Chapman (et al.). Proceedings of the 14th World Congress on Medical and Health Informatics, 677-681

    Konferens

    We translated an existing English negation lexicon (NegEx) to Swedish, French, and German and compared the lexicon on corpora from each language. We observed Zipf’s law for all languages, i.e., a few phrases occur a large number of times, and a large number of phrases occur fewer times. Negation triggers “no” and “not” were common for all languages; however, other triggers varied considerably. The lexicon is available in OWL and RDF format and can be extended to other languages. We discuss the challenges in translating negation triggers to other languages and issues in representing multilingual lexical knowledge.

    Läs mer om Extending the NegEx Lexicon for Multiple Languages
  • Negation Scope Delimitation in Clinical Text Using Three Approaches: NegEx, PyConTextNLP and SynNeg

    2013. Hideyuki Tanushi (et al.). Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013), 387-474

    Konferens

    Negation detection is a key component in clinical information extraction systems, as health record text contains reasonings in which the physician excludes different diagnoses by negating them. Many systems for negation detection rely on negation cues (e.g. not), but only few studies have investigated if the syntactic structure of the sentences can be used for determining the scope of these cues. We have in this paper compared three different systems for negation detection in Swedish clinical text (NegEx, PyConTextNLP and SynNeg), which have different approaches for determining the scope of negation cues. NegEx uses the distance between the cue and the disease, PyConTextNLP relies on a list of conjunctions limiting the scope of a cue, and in SynNeg the boundaries of the sentence units, provided by a syntactic parser, limit the scope of the cues. The three systems produced similar results, detecting negation with an F-score of around 80%, but using a parser had advantages when handling longer, complex sentences or short sentences with contradictory statements.

    Läs mer om Negation Scope Delimitation in Clinical Text Using Three Approaches
  • Using text prediction for facilitating input and improving readability of clinical text

    2013. Magnus Ahltorp (et al.). MedInfo 2013, 1149-1149

    Konferens

    Text prediction has the potential for facilitating and speeding up the documentation work within health care, making it possible for health personnel to allocate less time to documentation and more time to patient care. It also offers a way to produce clinical text with fewer misspellings and abbreviations, increasing readability. We have explored how text prediction can be used for input of clinical text, and how the specific challenges of text prediction in this domain can be addressed. A text prediction prototype was constructed using data from a medical journal and from medical terminologies. This prototype achieved keystroke savings of 26% when evaluated on texts mimicking authentic clinical text. The results are encouraging, indicating that there are feasible methods for text prediction in the clinical domain.

    Läs mer om Using text prediction for facilitating input and improving readability of clinical text
  • Vocabulary Expansion by Semantic Extraction of Medical Terms

    2013. Maria Skeppstedt, Magnus Ahltorp, Aron Henriksson. Proceedings of the 5th International Symposiumon Languages in Biology and Medicine, 63-68

    Konferens

    Automatic methods for vocabulary expansion are valuable in supporting the development of terminological resources. Here, we evaluate two methods based on distributional semantics for extracting terms that belong to a certain semantic category. In a list of 1000 terms extracted from a corpus of Swedish medical text, the best method obtains a recall of 0.53 and 0.88, respectively, for identifying 90 terms that are known to belong to the semantic categories Medical Finding and Pharmaceutical Drug.

    Läs mer om Vocabulary Expansion by Semantic Extraction of Medical Terms
  • Entity Recognition of Pharmaceutical Drugs in Swedish Clinical Text

    2012. Sidrat ul Muntaha (et al.). Proceedings of the Conference, 77-78

    Konferens

    An entity recognition system for expressions of pharmaceutical drugs, based on vocabulary lists from FASS, the Medical Subject Headings and SNOMED~CT, achieved a precision of 94\% and a recall of 74\% when evaluated on assessment texts from Swedish emergency unit health records.

    Läs mer om Entity Recognition of Pharmaceutical Drugs in Swedish Clinical Text
  • From Disorder to Order: Extracting clinical findings from unstructured text

    2012. Maria Skeppstedt.

    Avhandling (Lic)

    Medical disorders and findings are examples of important information in health record text. Through developing methods for automatically extracting these entities from the health record text, the possibility of making use of the information by automatic computerised processes increases. That a disorder or finding is mentioned in the health record, however, does not necessarily imply that it has been observed in the patient, because disorders that are ruled out and findings that are not observed in the patient are also mentioned.

    This licentiate thesis investigates the possibility of automatically extracting disorders and findings from Swedish health record text and the possibility of automatically determining whether these findings and disorders are negated or not.

    A rule- and terminology-based system that uses several Swedish medical terminologies, including SNOMED~CT and ICD-10 for extracting disorders, findings and body structures mentioned in Swedish clinical text was constructed and evaluated. Moreover, an English rule-based system for negation detection, NegEx, was adapted to Swedish and evaluated on clinical text written in Swedish.

    The evaluation showed that disorders and findings were recognised with low recall, whereas body structures were recognised with comparatively good results. The negation detection system that was adapted to Swedish achieved the same recall as the English system, but lower precision.

    The evaluated systems are accurate enough to be useful in some applications, but need to be further developed, especially when it comes to recognising disorders and findings.

    Läs mer om From Disorder to Order
  • Rule-based Entity Recognition and Coverage of SNOMED CT in Swedish Clinical Text

    2012. Maria Skeppstedt, Maria Kvist, Hercules Dalianis. LREC 2012 8th ELRA Conference on Language Resources and Evaluation, 1250-1257

    Konferens

    Named entity recognition of the clinical entities disorders, findings and body structures is needed for information extraction from unstructured text in health records. Clinical notes from a Swedish emergency unit were annotated and used for evaluating a rule- and terminology-based entity recognition system. This system used different preprocessing techniques for matching terms to SNOMED CT, and, one by one, four other terminologies were added. For the class body structure, the results improved with preprocessing, whereas only small improvements were shown for the classes disorder and finding. The best average results were achieved when all terminologies were used together. The entity body structure was recognised with a precision of 0.74 and a recall of 0.80, whereas lower results were achieved for disorder (precision: 0.75, recall: 0.55) and for finding (precision: 0.57, recall: 0.30). The proportion of entities containing abbreviations were higher for false negatives than for correctly recognised entities, and no entities containing more than two tokens were recognised by the system. Low recall for disorders and findings shows both that additional methods are needed for entity recognition and that there are many expressions in clinical text that are not included in SNOMED CT.

    Läs mer om Rule-based Entity Recognition and Coverage of SNOMED CT in Swedish Clinical Text
  • Stockholm EPR Corpus: A Clinical Database Used to Improve Health Care

    2012. Hercules Dalianis (et al.). Proceedings of SLCT 2012, 17-18

    Konferens

    The care of patients is well documented in health records. Despite being a valuable source of information that could be mined by computers and used to improve health care, health records are not readily available for research. Moreover, the narrative parts of the records are noisy and need to be interpreted by domain experts. In this abstract we describe our experiences of gaining access to a database of electronic health records for research. We also highlight some important issues in this domain and describe a number of possible applications, including comorbidity networks, detection of hospital-acquired infections and adverse drug reactions, as well as diagnosis coding support.

    Läs mer om Stockholm EPR Corpus
  • Synonym Extraction of Medical Terms from Clinical Text Using Combinations of Word Space Models

    2012. Aron Henriksson (et al.). Proceedings of the 5th International Symposium on Semantic Mining in Biomedicine (SMBM 2012), 10-17

    Konferens

    In information extraction, it is useful to know if two signifiers have the same or very similar semantic content. Maintaining such information in a controlled vocabulary is, however, costly. Here it is demonstrated how synonyms of medical terms can be extracted automatically from a large corpus of clinical text using distributional semantics. By combining Random Indexing and Random Permutation, different lexical semantic aspects are captured, effectively increasing our ability to identify synonymic relations between terms. 44% of 340 synonym pairs from MeSH are successfully extracted in a list of ten suggestions. The models can also be used to map abbreviations to their full-length forms; simple pattern-based filtering of the suggestions yields substantial improvements.

    Läs mer om Synonym Extraction of Medical Terms from Clinical Text Using Combinations of Word Space Models
  • Characteristics of Finnish and Swedish intensive care nursing narratives: a comparative analysis to support the development of clinical language technologies

    2011. Helen Allvin (et al.). Journal of Biomedical Semantics 2 (S1), 1-11

    Artikel

    Background: Free text is helpful for entering information into electronic health records, but reusing it is a challenge. The need for language technology for processing Finnish and Swedish healthcare text is therefore evident; however, Finnish and Swedish are linguistically very dissimilar. In this paper we present a comparison of characteristics in Finnish and Swedish free-text nursing narratives from intensive care. This creates a framework for characterising and comparing clinical text and lays the groundwork for developing clinical language technologies. Methods: Our material included daily nursing narratives from one intensive care unit in Finland and one in Sweden. Inclusion criteria for patients were an inpatient period of least five days and an age of at least 16 years. We performed a comparative analysis as part of a collaborative effort between Finnish- and Swedish-speaking healthcare and language technology professionals that included both qualitative and quantitative aspects. The qualitative analysis addressed the content and structure of three average- sized health records from each country. In the quantitative analysis 514 Finnish and 379 Swedish health records were studied using various language technology tools. Results: Although the two languages are not closely related, nursing narratives in Finland and Sweden had many properties in common. Both made use of specialised jargon and their content was very similar. However, many of these characteristics were challenging regarding development of language technology to support producing and using clinical documentation. Conclusions: The way Finnish and Swedish intensive care nursing was documented, was not country or language dependent, but shared a common context, principles and structural features and even similar vocabulary elements. Technology solutions are therefore likely to be applicable to a wider range of natural languages, but they need linguistic tailoring. Availability: The Finnish and Swedish data can be found at: http://www.dsv.su.se/ hexanord/data/

    Läs mer om Characteristics of Finnish and Swedish intensive care nursing narratives
  • Negation detection in Swedish clinical text: An adaption of NegEx to Swedish

    2011. Maria Skeppstedt. Journal of Biomedical Semantics 2 (S3), 1-12

    Artikel

    Background: Most methods for negation detection in clinical text have been developed for English text, and there is a need for evaluating the feasibility of adapting these methods to other languages. A Swedish adaption of the English rule-based negation detection system NegEx, which detects negations through the use of trigger phrases, was therefore evaluated. Results: The Swedish adaption of NegEx showed a precision of 75.2% and a recall of 81.9%, when evaluated on 558 manually classified sentences containing negation triggers, and a negative predictive value of 96.5% when evaluated on 342 sentences not containing negation triggers. Conclusions: The precision was significantly lower for the Swedish adaptation than published results for the English version, but since many negated propositions were identified through a limited set of trigger phrases, it could nevertheless be concluded that the same trigger phrase approach is possible in a Swedish context, even though it needs to be further developed. Availability: The triggers used for the evaluation of the Swedish adaption of NegEx are available at http://people.dsv.su.se/~mariask/resources/triggers.txt and can be used together with the original NegEx program for negation detection in Swedish clinical text.

    Läs mer om Negation detection in Swedish clinical text
  • Retrieving disorders and findings: Results using SNOMED CT and NegEx adapted for Swedish

    2011. Maria Skeppstedt, Hercules Dalianis, Gunnar H. Nilsson. LOUHI 2011 Health Document Text Mining and Information Analysis 2011, 11-17

    Konferens

    Access to reliable data from electronic health records is of high importance in several key areas in patient care, biomedical research, and education. However, many of the clinical entities are negated in the patient record text. Detecting what is a negation and what is not is therefore a key to high quality text mining. In this study we used the NegEx system adapted for Swedish to investigate negated clinical entities. We applied the system to a subset of free-text entries under a heading containing the word ‘assessment’ from the Stockholm EPR corpus, containing in total 23,171,559 tokens. Specifically, the explored entities were the SNOMED CT terms having the semantic categories ‘finding’ or ‘disorder’. The study showed that the proportion of negated clinical entities was around 9%. The results thus support that negations are abundant in clinical text and hence negation detection is vital for high quality text mining in the medical domain.

    Läs mer om Retrieving disorders and findings: Results using SNOMED CT and NegEx adapted for Swedish
  • Creating and Evaluating a Consensus for Negated and Speculative Words in a Swedish Clinical Corpus

    2010. Hercules Dalianis, Maria Skeppstedt. Proceedings of the Workshop on Negation and Speculation in Natural Language Processing ((NeSp-NLP 2010)), 5-13

    Konferens

    In this paper we describe the creation of a consensus corpus that was obtained through combining three individual annotations of the same clinical corpus in Swedish. We used a few basic rules that were executed automatically to create the consensus. The corpus contains negation words, speculative words, uncertain expressions and certain expressions. We evaluated the consensus using it for negation and speculation cue detection. We used Stanford NER, which is based on the machine learning algorithm Conditional Random Fields for the training and detection. For comparison we also used the clinical part of the BioScope Corpus and trained it with Stanford NER. For our clinical consensus corpus in Swedish we obtained a precision of 87.9 percent and a recall of 91.7 percent for negation cues, and for English with the Bioscope Corpus we obtained a precision of 97.6 percent and a recall of 96.7 percent for negation cues.

    Läs mer om Creating and Evaluating a Consensus for Negated and Speculative Words in a Swedish Clinical Corpus

Visa alla publikationer av Maria Skeppstedt vid Stockholms universitet

profilePageLayout