Ruprecht-Karls-Universität Heidelberg
Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg
Siegel der Uni Heidelberg

What Causes the Failure of Explicit to Implicit Discourse Relation Recognition?

Abstract

We consider an unanswered question in the discourse processing community: why do classifiers trained on explicit examples (with connectived removed) perform poorly in real implicit scenarios? Prior work claimed this is due to linguistic dissimilarity between explicit and implicit examples but provides no empirical evidence. In this study, we show that one cause for such failure is a label shift after connectives are eliminated. Specifically, we find that the discourse relations expressed by some explicit instances will change when connectives disappear. Unlike previous work manually analyzing a few examples, we present empirical evidence at the corpus level to prove the existence of such shift. Then, we investigate two strategies to mitigate label shift: filtering out noisy data and joint learning with connectives. Classifiers trained with our strategies significantly outperform strong baselines. More importantly, our method also works well on other discourse frameworks, such as RST, despite being designed based on the analysis of PDTB corpora.

zum Seitenanfang