CLARIN-D Curation Project "Semantic Annotation for Digital Humanities" (2015 - 2016)
Semantic Annotation for Digital Humanities
The curation project focuses on semantic annotation, particularly on Word Sense Disambiguation (WSD) and Semantic Role Labeling (SRL). Based on previous research, the aims of CP3 are twofolds:
Area A: Consolidation and further development of WebAnno for practical use in DH projects
Further development of the web-based annotation tool WebAnno for enabling flexible SRL annotation.
- WebAnno 3 is first released in December 2016, and is actively maintained as an open source project on github: https://webanno.github.io/webanno/
Area B: Curation of resources for semantic annotation and further annotation of the NoSta-D corpus
Creation of a benchmark annotated corpus for German.
- Annotation with VerbNet-style SRL are available on GNVN_semanno, including 3200 annotated predicate argument structures from the SALSA corpus as well as 450 predicate argument structures form the Dortmund Chat Corpus.
- Additionally, parallel SRL annotation with PropBank-, FrameNet- and VerbNet-style frameworks are available on SR3de, including 3000 instances of the CoNLL 2009 shared task German data (also included in the SALSA corpus).
Area C: Supporting Shared Tasks for German for selected annotation types
The project is aiming to support for further development of tools and ressources for German-language corpora via supporting shared tasks with suitable objectives. The following shared tasks could be supported:
- GermEval 2014 "Named Entity Recognition Shared Task", organized by Prof. Biemann and Prof. Padó.
- GermEval 2015 "LexSub: Shared Task for German-language Lexical Substitution", organized by Prof. Gurevych and Prof. Biemann.
- Additionally, a joint WSD/SRL shared task for German is planned by Prof. Frank and Prof. Gurevych.