Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

The Mystery of In-Context Learning of Large Language Models

Module Description

Course Module Abbreviation Credit Points
Bachelor CL AS-CL 8 LP
Master CL SS-CL-TAC 8 LP
Seminar Informatik BA + MA 4 LP
Anwendungsgebiet Informatik MA 8 LP
Anwendungsgebiet SciComp MA 8 LP
Lecturer Stefan Riezler
Module Type Seminar
Language English
First Session 15.04.2025
Time and Place Tuesday, 11:15 - 12:45
Mathematikon SR10
Commitment Period tbd.

Participants

Advanced Bachelor students and all Master students. Students from Computer Science or Scientific computing, especially those with application area Computational Linguistics are welcome.

Prerequisite for Participation

Good knowledge of statistical machine learning and experience in experimental work.

Assessment

  • 20%: Regular and active participation (discussion of presented papers during seminar sessions)
  • 60%: Oral presentation (30min presentation + 15min discussion, commitment for presentation by April 22, 2025, by email stating 3 ranked preferences for presentation slots)
  • 20%: Implementation project and written report (required for 8 LP) or written term paper (required for 4 LP) (5 pages, accompanied by signed declaration of independence of authorship, deadline end of semester)

Content

Large language models (LLMs) have initiated a paradigm shift in machine learning: In contrast to the classic pretraining-then-finetuning paradigm, in order to use LLMs for downstream prediction tasks, one only needs to provide a few demonstrations, known as in-context examples, without updating existing model parameters. This in-context learning (ICL) capability of LLMs is intriguing, and not yet fully understood.
In this seminar, we will discuss several theoretical and empirical approaches to explain this phenomenon. Depending on the required credit points, students will present and critically discuss the papers, and perform implementation projects that investigate the influence of prompting parameters such as ordering, similarity, or structure of in-context examples on prediction performance.

Schedule

Date Material Presenter
15.4. Orga Riezler
29.4. The Beginnings
Brown et al. (2020). Language Models are Few-Shot Learners. NeurIPS
Further reading:
Radford et al. (2019). Language Models are Unsupervised Multitask Learners. Tech Report
6.5. Implicit Gradient Descent
Oswald et al. (2023). Transformers Learn In-Context by Gradient Descent. ICML
Further reading:
Finn et al. (2017). Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML
13.5. Dual Form Gradient Descent
Dai et al. (2023). Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers. ACL
Further reading:
Irie et al. (2022). The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patters via Spotlights of Attention. ICML
20.5. Bayesian Inference
Xie et al. (2022). An Explanation of In-Context Learning as Implicit Bayesian Inference. ICLR
Further reading:
Wies et al. (2023). The Learnability of In-Context Learning. NeurIPS
27.5. Structure Identification
Hahn et al. (2023). A Theory of Emergent In-Context Learning as Implicit Structure Induction. ArXiv
Further reading:
Wang et al. (2023). Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning. ICML
3.6. Nearest Neighbor Regression
Collins et al. (2024). In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness. NeurIPS
Further reading:
Han et al. (2023). In-Context Learning of Large Language Models Explained es Kernel Regression. ArXiv
10.6. Principal Component Regression
Zhang et al. (2025). Training Dynamics of In-Context Learning in Linear Attention. ArXiv
Further reading:
Kim et al. (2024). Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape. ICML
17.6. Data Leakage
Balloccu et al. (2024). Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closes-Source LLMs. ACL
Further reading:
Xu et al. (2024). Benchmarking Benchmark Leakage in Large Language Models. Tech Report
24.6. Shortcut Learning
Du et al. (2024). Shortcut Learning of Large Language Models. Communications of the ACM
Further reading:
Tang et al. (2023). Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning. ACL
1.7. Memorization
McCoy et al. (2024). Embers of autoregression show how large language models are shaped by the problem they are trained to solve. PNAS
Further reading:
Shanahan (2024). Talking about Large Language Models. Communications of the ACM
8.7. Open Problem: Context Ordering
Liu et al. (2024). Lost in the Middle: How Language Models Use Long Contexts. TACL
Further reading:
Lu et al. (2022). Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Ordering Sensitivity. ACL
15.7. Open Problem: Counterfactual Settings
Wu et al. (2024). Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks. NAACL
Further reading:
Jiang et al. (2024). A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners. EMNLP
22.7. Open Problem: Compositional Reasoning
Dziri et al. (2023). Faith and Fate: Limits of Transformers on Compositionality. NeurIPS
Further reading:
Mondorf et al. (2024). Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models - A Survey
zum Seitenanfang