Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Lehrveranstaltungen
heiCO
Ressourcen	Fachschaft
Studien-FAQ	Technik-FAQ

The Mystery of In-Context Learning of Large Language Models

Module Description

Course	Module Abbreviation	Credit Points
Bachelor CL	AS-CL	8 LP
Master CL	SS-CL-TAC	8 LP
Seminar Informatik	BA + MA	4 LP
Anwendungsgebiet Informatik	MA	8 LP
Anwendungsgebiet SciComp	MA	8 LP

Lecturer	Stefan Riezler
Module Type	Seminar
Language	English
First Session	15.04.2025
Time and Place	Tuesday, 11:15 - 12:45 Mathematikon SR10
Commitment Period	tbd.

Participants

Advanced Bachelor students and all Master students. Students from Computer Science or Scientific computing, especially those with application area Computational Linguistics are welcome.

Prerequisite for Participation

Good knowledge of statistical machine learning and experience in experimental work.

Assessment

20%: Regular and active participation (discussion of presented papers during seminar sessions)
60%: Oral presentation (30min presentation + 15min discussion, commitment for presentation by April 22, 2025, by email stating 3 ranked preferences for presentation slots)
20%: Implementation project and written report (required for 8 LP) or written term paper (required for 4 LP) (5 pages, accompanied by signed declaration of independence of authorship, deadline end of semester)

Content

Large language models (LLMs) have initiated a paradigm shift in machine learning: In contrast to the classic pretraining-then-finetuning paradigm, in order to use LLMs for downstream prediction tasks, one only needs to provide a few demonstrations, known as in-context examples, without updating existing model parameters. This in-context learning (ICL) capability of LLMs is intriguing, and not yet fully understood.
In this seminar, we will discuss several theoretical and empirical approaches to explain this phenomenon. Depending on the required credit points, students will present and critically discuss the papers, and perform implementation projects that investigate the influence of prompting parameters such as ordering, similarity, or structure of in-context examples on prediction performance.

Schedule

Date	Material	Presenter
15.4.	Orga	Riezler
29.4.	The Beginnings Brown et al. (2020). Language Models are Few-Shot Learners. NeurIPS Further reading: Radford et al. (2019). Language Models are Unsupervised Multitask Learners. Tech Report
6.5.	Implicit Gradient Descent Oswald et al. (2023). Transformers Learn In-Context by Gradient Descent. ICML Further reading: Finn et al. (2017). Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML
13.5.	Dual Form Gradient Descent Dai et al. (2023). Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers. ACL Further reading: Irie et al. (2022). The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patters via Spotlights of Attention. ICML
20.5.	Bayesian Inference Xie et al. (2022). An Explanation of In-Context Learning as Implicit Bayesian Inference. ICLR Further reading: Wies et al. (2023). The Learnability of In-Context Learning. NeurIPS
27.5.	Structure Identification Hahn et al. (2023). A Theory of Emergent In-Context Learning as Implicit Structure Induction. ArXiv Further reading: Wang et al. (2023). Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning. ICML
3.6.	Nearest Neighbor Regression Collins et al. (2024). In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness. NeurIPS Further reading: Han et al. (2023). In-Context Learning of Large Language Models Explained es Kernel Regression. ArXiv
10.6.	Principal Component Regression Zhang et al. (2025). Training Dynamics of In-Context Learning in Linear Attention. ArXiv Further reading: Kim et al. (2024). Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape. ICML
17.6.	Data Leakage Balloccu et al. (2024). Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closes-Source LLMs. ACL Further reading: Xu et al. (2024). Benchmarking Benchmark Leakage in Large Language Models. Tech Report
24.6.	Shortcut Learning Du et al. (2024). Shortcut Learning of Large Language Models. Communications of the ACM Further reading: Tang et al. (2023). Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning. ACL
1.7.	Memorization McCoy et al. (2024). Embers of autoregression show how large language models are shaped by the problem they are trained to solve. PNAS Further reading: Shanahan (2024). Talking about Large Language Models. Communications of the ACM
8.7.	Open Problem: Context Ordering Liu et al. (2024). Lost in the Middle: How Language Models Use Long Contexts. TACL Further reading: Lu et al. (2022). Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Ordering Sensitivity. ACL
15.7.	Open Problem: Counterfactual Settings Wu et al. (2024). Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks. NAACL Further reading: Jiang et al. (2024). A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners. EMNLP
22.7.	Open Problem: Compositional Reasoning Dziri et al. (2023). Faith and Fate: Limits of Transformers on Compositionality. NeurIPS Further reading: Mondorf et al. (2024). Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models - A Survey