Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Mechanistic Interpretability

Module Description

Course Module Abbreviation Credit Points
BA-2010[100%|75%] CS-CL 6 LP
BA-2010[50%] BS-CL 6 LP
BA-2010[25%] BS-AC 4 LP
BA-2010 AS-CL 8 LP
Master SS-CL-TAC 8 LP
Lecturer Frederick Riemenschneider
Module Type Proseminar / Hauptseminar
Language English
First Session 17.10.2024
Time and Place Thursdays, 13:15 - 14:45,
INF 329 / SR 26
Commitment Period tbd.

Prerequisites for Participation

  • Completion of Programming I and Introduction to Computational Linguistics or similar introductory courses
  • Programming II, Mathematical Foundations of Computational Linguistics and Statistics are heavily suggested

Assessment

  • Active participation
  • Presentation
  • Implementation project

Content

The emergence of super-large language models has largely rendered conventional approaches in NLP obsolete, such as devising tricks to gain a few percentage points in benchmarks. In many cases, it is no longer feasible to fine-tune models using traditional methods. This shift has led to a change in the focus of research in the field of NLP.

One promising area of research is Mechanistic Interpretability, which aims to reverse-engineer the computational mechanisms and representations learned by neural networks into human-understandable algorithms and concepts. This approach provides a granular, causal understanding of how these models operate. By striving for a genuine understanding of the underlying mechanisms, Mechanistic Interpretability offers an exciting avenue for advancing our knowledge of language models.

This seminar will provide an overview of the key papers and ideas in the field of Mechanistic Interpretability, highlighting the progress made thus far and the potential future directions for research.

» More Materials

zum Seitenanfang