Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Title: Medical information extraction from German discharge letters from the cardiovascular domain

Speaker: Phillip Richter-Pechanski

Abstract

There is a growing pool of unstructured medical texts in the German language. In order to be able to use the rich information captured in these texts in clinical research and data-driven applications in clinical routine tasks, robust natural language processing methods for medical information extraction need to be developed and evaluated. In the first part of this talk I will give a brief overview of challenges and the current state of clinical NLP research in general and German clinical NLP in particular. I will further describe annotation efforts that we conduct jointly with our clinical partners at the Cardiology department of the university hospital Heidelberg. Based on this data we conducted initial experiments using a German BERT model for entity recognition and classification of thirteen clinical concepts, achieving a F1-score of 88.1%, which outperforms our baseline models by 1.7 (CRF) and 5,9 (BiLSTM) percentage points. The second part of this presentation will focus on the current state of my thesis work. I will describe our three clinical information extraction tasks on German discharge letters. These are two text classification tasks: (1) text segmentation, (2) medication indication extraction and one entity recognition task: (3) cardiovascular concept extraction. After defining our main goals: keeping manual annotation effort low and involving domain expertise of physicians, I will address the question whether current prompt-based methods in few-shot learning scenarios, combined with domain- and task-based pre-training of our language models can match or even outperform our current supervised learning-based methods.