Ruprecht-Karls-Universität Heidelberg
Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg
Siegel der Uni Heidelberg

Title: A Simple and Effective Hybrid Model for Cross-lingual Summarization of Long Texts

Speaker: Mehwish Fatima (HITS)

Abstract

We present a simple and effective hybrid model for cross-lingual summarization of long texts. A long document has several hundred to thousand words that create an obstacle for recent abstractive summarization models. The recent neural abstractive summarization models impose restrictions on document length and vocabulary size. We propose a hybrid summarization model that deals with the document length and has an optimized vocabulary construction. We demonstrate our proposed model’s empirical evaluation on a real cross-lingual dataset harvested from Wikipedia for English and German. Our proposed models show significant improvement over different cross-lingual baselines. We also present some further evaluation with data and model variations to investigate their impact on the performance.
zum Seitenanfang