Profiles

murathan_kurfali

Murathan Kurfali

Doktorand

View page in English
Arbetar vid Institutionen för lingvistik
Telefon 08-16 39 38
E-post murathan.kurfali@ling.su.se
Besöksadress Universitetsvägen 10 C, plan 2-3
Rum C 348
Postadress Institutionen för lingvistik 106 91 Stockholm

Publikationer

I urval från Stockholms universitets publikationsdatabas
  • 2019. Murathan Kurfali, Robert Östling. Proceedings of the Fourth Conference on Machine Translation (WMT), 279-283

    We present a very simple method for parallel text cleaning of low-resource languages, based on projection of word embeddings trained on large monolingual corpora in high-resource languages. In spite of its simplicity, we approach the strong baseline system in the downstream machine translation evaluation.

  • 2019. Deniz Zeyrek (et al.). Language resources and evaluation

    TED-Multilingual Discourse Bank, or TED-MDB, is a multilingual resource where TED-talks are annotated at the discourse level in 6 languages (English, Polish, German, Russian, European Portuguese, and Turkish) following the aims and principles of PDTB. We explain the corpus design criteria, which has three main features: the linguistic characteristics of the languages involved, the interactive nature of TED talks—which led us to annotate Hypophora, and the decision to avoid projection. We report our annotation consistency, and post-annotation alignment experiments, and provide a cross-lingual comparison based on corpus statistics.

  • 2019. Murathan Kurfali, Robert Östling. 20th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 226-231

    Automatically classifying the relation between sentences in a discourse is a challenging task, in particular when there is no overt expression of the relation. It becomes even more challenging by the fact that annotated training data exists only for a small number of languages, such as English and Chinese. We present a new system using zero-shot transfer learning for implicit discourse relation classification, where the only resource used for the target language is unannotated parallel text. This system is evaluated on the discourse-annotated TEDMDB parallel corpus, where it obtains good results for all seven languages using only English training data.

Visa alla publikationer av Murathan Kurfali vid Stockholms universitet

Senast uppdaterad: 26 september 2019

Bokmärk och dela Tipsa