HeiST – Heidelberg Sentiment Treebank
A German dataset for Compositional Sentiment Analysis
HeiST originated in the MA project of Michael Haas (Weakly Supervised Learning for Compositional Sentiment Recognition) as a German counterpart to the Stanford Sentiment Treebank, and has been constructed in a similar fashion. The textual basis of HeiST are creative-commons-licensed reviews from the German movie review site Filmrezensionen.de, from which we extracted the evaluation summary ("Fazit") sentences.
HeiST comprises 1184 trees where each node has a sentiment label.
The crowdsourcing of HeiST has been supported in part by the Institute of Computational Linguistics and by Yannick Versley's private funds.
Links
HeiST can be downloaded here:
HeiST-1.0.tar.gz.
The code for the experiments can be found in Michael Haas' github project
For additional bachground, see the following material:
- Michael Haas and Yannick Versley (2015) Subsentential Sentiment on a Shoestring: A Crosslingual Analysis of Compositional Classification. In Proceedings of NAACL-HLT 2015.
- Michael Haas (2015) Weakly Supervised Learning for Compositional Sentiment Recognition.
M.A. Thesis, University of Heidelberg.