Time and longitudinal data in summarization and other NLP tasks
Kursbeschreibung
Studiengang | Modulkürzel | Leistungs- bewertung |
---|---|---|
BA-2010 | AS-FL | 8 LP |
Master | SS-CL, SS-TAC | 8 LP |
Dozenten/-innen | Katja Markert |
Veranstaltungsart | Hauptseminar |
Erster Termin | 27.04.2017 |
Zeit und Ort | Do. 11:15–12:45 INF 325 SR 24 |
Commitment-Frist: | 19.06.17 bis 12.07.17 |
Teilnahmevoraussetzungen
Das Seminar wendet sich an fortgeschrittene Bachelor sowie Master-Studierende. Unterrichtssprache ist meist Deutsch, auch wenn nachfolgender Kommentar auf Englisch geschrieben ist.
Leistungsnachweis
- Aktive Teilnahme sowie Lektürevorbereitung
- Referat
- Projekt, Hausarbeit oder Zweitreferat (je nach Interesse und Auslastung). Projekte in Zweiergruppen möglich.
Inhalt
This seminar will look at dealing with textual data that arises over a longer stretch of time. In particular, we will focus on current research on the following topics (with more specific choices to be made following student interests):- Timeline and real-time summarization: The continuous stream of news as well as social media data offers new challenges in both (offline) multi-document summarization and (online) alerting of users to important breaking news. We will look at approaches to summarising longer-running events into timelines (e.g. to give a concise, dated overview over a long-running civil war). In addition, the TREC competitions on topic detection and topic tracking as well as on real-time summarization are addressing the slightly different problem of updating a user about an ongoing event in a timely manner and with low redundancy, especially important in a crisis (see http://www.trec-ts.org/ and http://trecrts.github.io/). Another related task is automated Wikipedia enhancement by the inclusion of novel events. Of particular interests for these topics is the performance of algorithms on dynamic corpora, the questions of relevance and redundancy, as well as appropriate evaluation frameworks.
- Temporal information retrieval: Temporal information retrieval is a vast field. We will in this seminar focus only on the understanding and classification of temporal queries as well as on models for improving search diversity with regard to time. One (small) impression of the problems is given by the recent Temporalia task at http://ntcirtemporalia.github.io.
[That means we will, for example, ignore temporal markup languages, the curating of web archives, visualisation of temporal information etc.] - The effect of longitudinal data in other NLP tasks (IE, lexica etc:): Whereas there is a long-standing interest in NLP in domain adaptation, there are fewer papers on the necessity of temporal adaptation (for example, between working on 1950s news documents and current ones). We can consider, for example, papers on the effect of time on entity linking and disambiguation (such as "Clinton" having a prior for "Bill" in the 90s and for "Hilary" now). This is especially relevant for social media data where entity priors are both seasonal and bursty. Another option is lexicon development (for example, connotation shifts over time).
- Note that we will NOT discuss the representation of time or temporal annotation in the sense of TempEval or TIMEX. These topics will be discussed in Michael Herweg's seminar. Instead we will focus on applications that need to take into account data that arises over longer time periods.
Kursübersicht
Seminarplan
Datum | Sitzung | Materialien |
Literatur
Literatur in Artikelform wird zu Semesterbeginn bekanntgegeben und ist zumeist frei online erhältlich.