Title: Medical information extraction from German discharge letters from the cardiovascular domain
Speaker: Phillip Richter-Pechanski
Abstract
There is a growing pool of unstructured medical texts in the German language. In order to be able to use the rich
information captured in these texts in clinical research and data-driven applications in clinical routine tasks, robust
natural language processing methods for medical information extraction need to be developed and evaluated.
In the first part of this talk I will give a brief overview of challenges and the current state of clinical NLP research
in general and German clinical NLP in particular. I will further describe annotation efforts that we conduct jointly
with our clinical partners at the Cardiology department of the university hospital Heidelberg. Based on this data we
conducted initial experiments using a German BERT model for entity recognition and classification of thirteen clinical
concepts, achieving a F1-score of 88.1%, which outperforms our baseline models by 1.7 (CRF) and 5,9 (BiLSTM) percentage
points.
The second part of this presentation will focus on the current state of my thesis work. I will describe our three
clinical information extraction tasks on German discharge letters. These are two text classification tasks: (1) text
segmentation, (2) medication indication extraction and one entity recognition task: (3) cardiovascular concept
extraction. After defining our main goals: keeping manual annotation effort low and involving domain expertise of
physicians, I will address the question whether current prompt-based methods in few-shot learning scenarios, combined
with domain- and task-based pre-training of our language models can match or even outperform our current supervised
learning-based methods.