Stockholms universitet logo, länk till startsida

Murathan KurfaliForskare


I urval från Stockholms universitets publikationsdatabas

  • Labeling Explicit Discourse Relations Using Pre-trained Language Models

    2020. Murathan Kurfali. Text, Speech, and Dialogue, 79-86


    Labeling explicit discourse relations is one of the most challenging sub-tasks of the shallow discourse parsing where the goal is to identify the discourse connectives and the boundaries of their arguments. The state-of-the-art models achieve slightly above 45% of F-score by using hand-crafted features. The current paper investigates the efficacy of the pre-trained language models in this task. We find that the pre-trained language models, when finetuned, are powerful enough to replace the linguistic features. We evaluate our model on PDTB 2.0 and report the state-of-the-art results in extraction of the full relation. This is the first time when a model outperforms the knowledge intensive models without employing any linguistic features.

    Läs mer om Labeling Explicit Discourse Relations Using Pre-trained Language Models
  • Zero-shot transfer for implicit discourse relation classification

    2019. Murathan Kurfali, Robert Östling. 20th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 226-231


    Automatically classifying the relation between sentences in a discourse is a challenging task, in particular when there is no overt expression of the relation. It becomes even more challenging by the fact that annotated training data exists only for a small number of languages, such as English and Chinese. We present a new system using zero-shot transfer learning for implicit discourse relation classification, where the only resource used for the target language is unannotated parallel text. This system is evaluated on the discourse-annotated TEDMDB parallel corpus, where it obtains good results for all seven languages using only English training data.

    Läs mer om Zero-shot transfer for implicit discourse relation classification
  • TED Multilingual Discourse Bank (TED-MDB)

    2019. Deniz Zeyrek (et al.). Language resources and evaluation


    TED-Multilingual Discourse Bank, or TED-MDB, is a multilingual resource where TED-talks are annotated at the discourse level in 6 languages (English, Polish, German, Russian, European Portuguese, and Turkish) following the aims and principles of PDTB. We explain the corpus design criteria, which has three main features: the linguistic characteristics of the languages involved, the interactive nature of TED talks—which led us to annotate Hypophora, and the decision to avoid projection. We report our annotation consistency, and post-annotation alignment experiments, and provide a cross-lingual comparison based on corpus statistics.

    Läs mer om TED Multilingual Discourse Bank (TED-MDB)
  • A Multi-Word Expression Dataset for Swedish

    2020. Murathan Kurfali (et al.). Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 4402-4409


    We present a new set of 96 Swedish multi-word expressions annotated with degree of (non-)compositionality. In contrast to most previous compositionality datasets we also consider syntactically complex constructions and publish a formal specification of each expression. This allows evaluation of computational models beyond word bigrams, which have so far been the norm. Finally, we use the annotations to evaluate a system for automatic compositionality estimation based on distributional semantics. Our analysis of the disagreements between human annotators and the distributional model reveal interesting questions related to the perception of compositionality, and should be informative to future work in the area.

    Läs mer om A Multi-Word Expression Dataset for Swedish

Visa alla publikationer av Murathan Kurfali vid Stockholms universitet