Title: Coherence Measures do not Predict Summary Coherence
Speaker: Julius Steen (ICL)
Abstract
Automatic evaluation measures for summary coherence hold the promise of enabling more comprehensive evaluation of
summary linguistic quality as well as improving summarizer coherence. However, there is no clear consensus on which
coherence measures are most appropriate for this task. We thus investigate the efficacy of both general linguistic
quality measures and specialized coherence measures for evaluating summary coherence. Unlike prior work, we evaluate on
a large set of recent summarizer outputs and introduce an evaluation protocol that focuses on fine-grained coherence
judgements instead of coarse-grained system performance. We show that none of the available measures can satisfactorily
predict coherence judgements.