
Linguistics in Modern NLP
Module Description
Course | Module Abbreviation | Credit Points |
---|---|---|
BA-2010 | AS-CL, AS-FL | 8 LP |
BA-2010[100%|75%] | CS-CL | 6 LP |
BA-2010[50%] | BS-CL | 6 LP |
BA-2010[25%] | BS-AC, BS-FL | 4 LP |
Master | SS-CL-TAC, SS-SC-FAL | 8 LP |
Lecturer | Katja Markert |
Module Type | Proseminar / Hauptseminar |
Language | English |
First Session | 15.04.2025 |
Time and Place | Di, 15:15-16:45, INF 326 / SR 27 |
Commitment Period | tbd. |
Participants
This course is suitable for Advanced CL Bachelor Students and all CL Master Students. It is particularly suitable for Bachelor students with a secondary linguistic subject and Master students coming from a more language-oriented background.
Students from Scientific Computing or Data and Computer Science need to contact me before they can take part in this seminar.
The number of participants is restricted to 14. Should there be more willing participants, BA students with a secondary linguistic subject or MA students with a linguistic background have preference.
Prerequisites for Participation
For Bachelor students: ECL, Introduction to Linguistics as well as Formal Syntax or Formal Semantics
For Master Students: no formal prerequisites but a more lingui9stic background is most suitable.
Assessment
- Active Participation, including exercises, reviewing of other students' work and participating in seminar discussions
- Presentation
- Implementation Project (or a linguistic data-oriented project if suitable). Projects are preferrably conducted in pairs. 25% CL students can also do a term paper instead of an implementation project.
Content
For good performance in most modern NLP tasks (language modelling, summarization, MT, verbal reasoning), algorithms do not need a linguistic component. Lots of good data, a large transformer model and subsequent instruction-tuning and//or reinforcement learning variation will achieve good to excellent performance. We will discuss in this seminar where linguistics still plays a role (spoiler: rarely in algorithm development for a language model). We will look at the following topics:
- Multilingual language models and cross-lingual transfer
- Linguistic generalisations and their representations in language models
- Work on low-resource languages and dialects, including benchmark datasets
- Meta-linguistic performance of language models (Why do language models perform worse for morphologically complex languages? How do language models align with human judgements on grammatical constructions? How do language models process generics? LLM-as-a-judge for linguistic data)
- Linguistic analysis of LLM output text
- What can inductive biases in LLMs tell us about language?
Agenda
Date | Session | Materials |