Introduction
A project funded within the Sonderforschungsbereich SFB 619: "Ritual Dynamics"
Joint project of
- Prof. Dr. Anette Frank, Department of Computational Linguistics
- Prof. Dr. Axel Michaels, South-Asia Institute, Department of Classical Indology, University of Heidelberg
Researchers:
- Nils Reiter, Department of Computational Linguistics
- Anand Mishra, South-Asia Institute, Department of Classical Indology, University of Heidelberg
- Oliver Hellwig, South-Asia Institute, Department of Classical Indology, University of Heidelberg
Outcomes
Domain-Adaptable Deep Linguistic Processing Pipeline
We have developed a UIMA-based processing pipeline that automatically annotates information from various linguistic levels. The following list contains the different analysis levels and the respective tool we have integrated:
- Tokenization: Heuristically, based on character classes
- Sentence splitting: MorphAdorner
- Part of speech-tagging: OpenNLP
- Chunking: OpenNLP
- Lemmatization: Stanford CoreNLP
- Dependency parsing: Mate Parser
- Word sense disambiguation: UKB
- Coreference resolution: BART
- Semantic role labeling: Semafor
Adaptation
Many components have been adaptated to the ritual domain by retraining statistical models, see Reiter et al. (2010), Frank et al. (2012) and Reiter (2013) for details.
Discourse Representation
The outcome of the processing pipeline is an XML-based discourse representation tailored to our needs. The following class diagram shows the most important classes.
Search
We provide a search tool that allows searching for n-grams of events and inspecting individual results as well as
aggregated statistics. The aggregated statistics shows the relative position of the search terms within their ritual
sequence. The following picture shows the position distribution for the event sub sequence giving the
dakṣiṇā:
Visualisation Tools
Based on the integrated discourse representation produced by the processing pipeline, we developed a number of visualisation tools that allow researchers the targeted inspection of interesting spots.
Entity Graph
The entity graph shows participants of rituals in a graph-based form. Each participant is represented as a vertex in a graph. Two vertices are connected, if they appear in the same event. The vertices are directly linked to their appearance in the source texts, as shown on the right.Alignment Graph
The alignment graph shows alignments between event sequences. Each node (connected in red or blue) represents an event in one of the sequences, showing the frame name in bold and the lemma in parentheses. Furthermore, each frame is connected with its frame element fillers, to the left or right respectively. The colors of the frame element fillers represent discourse entities. The links between two events are shown in the middle and represent alignments between the events. The two sequences can be moved interactively in this web-based visualisation.
Connectivity Graph
This graph shows the connectivity to the other sequence for each node, organized by the node sequence. Higher scores mean that the node (and it's context) is more direct connected to the other sequence. The highest scores have been marked with their id, the areas enclosed in dotted lines show a subsequence of events happening in the middle of one and at the end of the other sequence.
Dissertation
Nils Reiter: Discovering Structural Similarities in Narrative Texts using Event Alignment Algorithms, 2013, defended.
Talks
- Nils Reiter and Anette Frank: Lightning talk and poster presentation at the Herrenhausen Conference: (Digital) Humanities Revisited – Challenges and Opportunities in the Digital Age, Hannover, December 5-7, 2013.
- Anette Frank and Nils Reiter: Invited talk at the First AMICUS Workshop (Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts), Vienna, October 21st, 2010.
Publications
- Frank, A., Bögel, T., Hellwig, O., and Reiter, N. (2012): Semantic Annotation for the Digital Humanities - Using Markov Logic Networks for Annotation Consistency Control. Linguistic Issues in Language Technology, 7, 1-21.
- Reiter, N., Hellwig, O., Frank, A., Gossmann, I., Larios, B. M., Rodrigues, J. and Zeller, B. (2011): Adapting NLP Tools and Frame-Semantic Resources for the Semantic Analysis of Ritual Descriptions. in: Sporleder, C., van den Bosch, A., and Zervanou, K. A. (eds.), Language Technology for Cultural Heritage, Foundations of Human Language Processing and Technology, Springer.
- Reiter, N., Hellwig, O., and Frank, A. (2011): Semi-Automatic Semantic Analysis of Rituals: Chances and Challenges. in: Felder, E., Müller, M., and Vogel, F. (eds.), Thematische Korpora als Basis diskurslinguistischer Analysen von Texten und Gesprächen, Korpuspragmatik, De Gruyter.
- Reiter, N., Hellwig, O., Mishra, A., Gossmann, I., Larios, B. M., Rodrigues, J., Zeller, B., and Frank, A. (2010): Adapting Standard NLP Tools and Resources to the Processing of Ritual Descriptions. Proceedings of the ECAI 2010 Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), Lissabon.
- Reiter, N., Hellwig, O., Mishra, A., Frank, A., and Burkhardt, J. (2010): Using NLP Methods for the Analysis of Rituals. Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta.