Publications
A selection from Stockholm University publication database-
2019. Murathan Kurfali, Robert Östling. 20th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 226-231
Automatically classifying the relation between sentences in a discourse is a challenging task, in particular when there is no overt expression of the relation. It becomes even more challenging by the fact that annotated training data exists only for a small number of languages, such as English and Chinese. We present a new system using zero-shot transfer learning for implicit discourse relation classification, where the only resource used for the target language is unannotated parallel text. This system is evaluated on the discourse-annotated TEDMDB parallel corpus, where it obtains good results for all seven languages using only English training data.
-
2019. Deniz Zeyrek (et al.). Language resources and evaluation
TED-Multilingual Discourse Bank, or TED-MDB, is a multilingual resource where TED-talks are annotated at the discourse level in 6 languages (English, Polish, German, Russian, European Portuguese, and Turkish) following the aims and principles of PDTB. We explain the corpus design criteria, which has three main features: the linguistic characteristics of the languages involved, the interactive nature of TED talks—which led us to annotate Hypophora, and the decision to avoid projection. We report our annotation consistency, and post-annotation alignment experiments, and provide a cross-lingual comparison based on corpus statistics.
-
2019. Murathan Kurfali, Robert Östling. Proceedings of the Fourth Conference on Machine Translation (WMT), 279-283
We present a very simple method for parallel text cleaning of low-resource languages, based on projection of word embeddings trained on large monolingual corpora in high-resource languages. In spite of its simplicity, we approach the strong baseline system in the downstream machine translation evaluation.