Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Time and longitudinal data in summarization and other NLP tasks

Kursbeschreibung

Studiengang Modulkürzel Leistungs-
bewertung
BA-2010 AS-FL 8 LP
Master SS-CL, SS-TAC 8 LP
Dozenten/-innen Katja Markert
Veranstaltungsart Hauptseminar
Erster Termin 27.04.2017
Zeit und Ort Do. 11:1512:45 INF 325 SR 24
Commitment-Frist:19.06.17 bis 12.07.17

Teilnahmevoraussetzungen

Das Seminar wendet sich an fortgeschrittene Bachelor sowie Master-Studierende. Unterrichtssprache ist meist Deutsch, auch wenn nachfolgender Kommentar auf Englisch geschrieben ist.

Leistungsnachweis

  • Aktive Teilnahme sowie Lektürevorbereitung
  • Referat
  • Projekt, Hausarbeit oder Zweitreferat (je nach Interesse und Auslastung). Projekte in Zweiergruppen möglich.

Inhalt

This seminar will look at dealing with textual data that arises over a longer stretch of time. In particular, we will focus on current research on the following topics (with more specific choices to be made following student interests):

  1. Timeline and real-time summarization: The continuous stream of news as well as social media data offers new challenges in both (offline) multi-document summarization and (online) alerting of users to important breaking news. We will look at approaches to summarising longer-running events into timelines (e.g. to give a concise, dated overview over a long-running civil war). In addition, the TREC competitions on topic detection and topic tracking as well as on real-time summarization are addressing the slightly different problem of updating a user about an ongoing event in a timely manner and with low redundancy, especially important in a crisis (see http://www.trec-ts.org/ and http://trecrts.github.io/). Another related task is automated Wikipedia enhancement by the inclusion of novel events. Of particular interests for these topics is the performance of algorithms on dynamic corpora, the questions of relevance and redundancy, as well as appropriate evaluation frameworks.
  2. Temporal information retrieval: Temporal information retrieval is a vast field. We will in this seminar focus only on the understanding and classification of temporal queries as well as on models for improving search diversity with regard to time. One (small) impression of the problems is given by the recent Temporalia task at http://ntcirtemporalia.github.io.
    [That means we will, for example, ignore temporal markup languages, the curating of web archives, visualisation of temporal information etc.]
  3. The effect of longitudinal data in other NLP tasks (IE, lexica etc:): Whereas there is a long-standing interest in NLP in domain adaptation, there are fewer papers on the necessity of temporal adaptation (for example, between working on 1950s news documents and current ones). We can consider, for example, papers on the effect of time on entity linking and disambiguation (such as "Clinton" having a prior for "Bill" in the 90s and for "Hilary" now). This is especially relevant for social media data where entity priors are both seasonal and bursty. Another option is lexicon development (for example, connotation shifts over time).
  4. Note that we will NOT discuss the representation of time or temporal annotation in the sense of TempEval or TIMEX. These topics will be discussed in Michael Herweg's seminar. Instead we will focus on applications that need to take into account data that arises over longer time periods.

Kursübersicht

Seminarplan

Datum Sitzung Materialien

Literatur

Literatur in Artikelform wird zu Semesterbeginn bekanntgegeben und ist zumeist frei online erhältlich.

» weitere Kursmaterialien

zum Seitenanfang