Title: BERTCoherence: Evaluating Text Generation with BERT and Discourse Coherence
Speaker: Wei Zhao (HITS)
Abstract
There has been a growing interest in developing text generation systems towards discourse coherence, e.g., modeling
interdependence between sentences. Recently, BERT-based metrics have become popular in system evaluation. While strong
in modeling semantics, they cannot recognize coherence and thus fail to punish incoherent elements in system outputs. In
this work, we introduce two unsupervised reference-based evaluation metrics, FocusDiff and SentGraph, for summarization
and document-level machine translation (MT), both of which use BERT to model discourse coherence according to Centering
theory---that formulates coherence from the lens of focus-of-readers in text. To interpret them, we analyze two
regularities that our metrics rely on in how much they distinguish hypothesis from reference. Our experiments encompass
14 non-discourse and discourse metrics (including ours), as well as coherence models (in the discourse community)
portrayed as metrics. We show that (i) previous BERT-based metrics do not correlate with human rated coherence, even
worse than early attempts towards discourse metrics~\cite{wong-kit-2012-extending} and (ii) there exists a strong
relation between regularities and results of metrics, i.e., the more discriminative regularities are, the better our
metrics perform---which encourages future research in discovering other novel regularities for better,
self-interpretable metrics.