Information Extraction and Applications
Module Description
Course | Module Abbreviation | Credit Points |
---|---|---|
BA-2010[100%|75%] | CS-CL | 6 LP |
BA-2010[50%] | BS-CL | 6 LP |
BA-2010 | AS-CL | 8 LP |
Master | SS-CL, SS-TAC | 8 LP |
Lecturer | Daniel Dahlmeier |
Module Type | |
Language | English |
First Session | 16.04.2021 |
Time and Place | Friday, 09:15-10:45, Online |
Commitment Period | tbd. |
Prerequisite for Participation
Assessment
Content
This seminar focuses on information extraction (IE) and its applications to business documents. After an overview of traditional IE methods, we will discuss recent research focusing on IE from form-like business documents, such as invoices or purchase orders.
Students will be assigned research papers for them to study and present in the seminar.
Module Overview
Agenda
Date | Session | Materials |
16.04.2021 9:15–10:45 |
|
Lecture slides and recording available on Moodle. |
23.04.2021 9:15–10:45 |
|
Lecture slides and recording available on Moodle. |
30.04.2021 9:15–10:45 |
|
Lecture slides and recording available on Moodle. |
07.05.2021 9:15–10:45 |
|
|
14.05.2021 9:15–10:45 |
|
|
21.05.2021 9:15–10:45 |
|
|
28.05.2021 9:15–10:45 |
|
|
04.06.2021 9:15–10:45 |
|
|
11.06.2021 9:15–10:45 |
|
|
18.06.2021 9:15–10:45 |
|
|
25.06.2021 9:15–10:45 |
|
|
02.07.2021 9:15–10:45 |
|
|
09.07.2021 9:15–10:45 |
|
|
16.07.2021 9:15–10:45 |
|
|
23.07.2021 9:15–10:45 |
|
Literature
- Daniel Jurafsky, James H. Martin. 2020. Speech and Language Processing
- Tong et al. 2003. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
- Florian et al. 2003. Named Entity Recognition through Classifier Combination
- Huang et al. 2015. Bidirectional LSTM-CRF Models for Sequence Tagging
- Lample et al. 2016. Neural Architectures for Named Entity Recognition
- Akbik et al 2018. Contextual String Embeddings for Sequence Labeling
- Yamada et al. 2020. LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
- Zhou et al. 2005. Exploring various knowledge in relation extraction. ACL.
- Snow et al. 2005. Learning syntactic patterns for automatic hypernym discovery. NeurIPS
- Surdeanu. 2013. Overview of the TAC2013 Knowledge Base Population evaluation: English slot filling and temporal slot filling. TAC-13.
- Riedel et al. 2013. Relation Extraction with Matrix Factorization and Universal Schemas
- Zhang et al. 2017. Position-aware attention and supervised data improve slot filling. EMNLP.
- Joshi et al. 2020. SpanBERT: Improving Pre-training by Representing and Predicting Spans
- Qian et al. 2019. GraphIE: A Graph-Based Framework for Information Extraction
- Katti et al. 2018. Chargrid: Towards Understanding 2D Documents.
- Liu et al. 2020. Graph Convolution for Multimodal Information Extraction from Visually Rich Documents
- Denk and Reisswig. 2019. Bertgrid: Contextualized embedding for 2d document representation and understanding
- Majumder et al. 2020. Representation Learning for Information Extraction from Form-like Documents
- Xu et al. 2020. LayoutLM: Pre-training of Text and Layout for Document Image Understanding
- Li et al. 2020. DocBank: A Benchmark Dataset for Document Layout Analysis
- Aggarwal et al. 2020. Form2Seq : A Framework for Higher-Order Form Structure Extraction
- Herzig et al. 2020. TAPAS: Weakly Supervised Table Parsing via Pre-training