CSIRO Research Directions in Scientific Literature Information Extraction
Abstract
For scientists seeking to capitalise on reported knowledge and data in scientific publications, tasks related to searching, browsing, reading, and analysing scientific literature can be costly time-consuming activities exacerbated by the rapid pace of publication. Language technologies like Natural Language Processing (NLP) and Information Retrieval (IR) offer valuable assistance in addressing this challenge. In this seminar, I will present an overview of scientific literature Information Extraction (IE) research undertaken by the Language Technology team of the Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia’s national science organisation. With the goal of exploring new possibilities for scientific literature tools, the team positions its research within the multidisciplinary context of the CSIRO research environment. This context offers informative examples of novel application scenarios and data sets where such tools might be helpful. I will present examples of these applications and describe how they inform the current research directions within the team, such as building on recent advances in Transformer neural networks and Large Language Models for specific science workflows.