Ruprecht-Karls-Universität Heidelberg
Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg
Siegel der Uni Heidelberg

Human Reinforcement Learning: Algorithms and Hands-on Practice

« zurück

Seminarplan

Dates for Lecture/Exercise Topic of Lecture Exercise (Deadline for Ex. N =
Monday 23:59pm before Lect. N+1)
23.4./25.4. Orga -
30.4./2.5. Lecture 1: Dynamic Programming Methods + Supplementary Material Installing jupyter notebooks
7.5./9.5. Lecture 2: Monte Carlo Methods Exercise 1/MDP
14.5./16.5. - -
21.5./23.5. Lecture 3: Policy Gradient Methods Exercise 2
28.5/30.5. Lecture 4: Sequence-to-sequence Reinforcement Learning Holiday
4.6./6.6. Lecture 5: Human Reinforcement Learning Exercise 3
11.6./13.6. Presentation 1 (P. Wiesenbach) + Interactive session Presentation 2 (M. Holzinger) + Interactive session
18.6./- Presentation 3 (A. Raptakis) + Interactive session Holiday
25.6./27.6. Presentation 4 (D. Siljak) + Interactive session Presentation 11 (N. Berger) + Interactive session
2.7./4.7. Presentation 6 (R. Hubert) + Interactive session Presentation 7 (M. Staniek) + Interactive session
9.7./11.7. Presentation 10 (S. Dubey) + Interactive session Presentation 9 (B. Beilharz) + Interactive session
16.7./18.7. - -
23.7./25.7. Interactive session Interactive session

Reading list

  1. Kreutzer et al. (2017). Bandit Structured Prediction for Neural Sequence-to-Sequence Learning. ACL
  2. Nguyen et al. (2017). Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback. EMNLP
  3. Swaminathan & Joachims (2015). Counterfactual Risk Minimization: Learning from Logged Bandit Feedback. ICML
  4. Lawrence et al. (2017). Counterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation. EMNLP
  5. Judah et al. (2010). Reinforcement Learning via Practice and Critique Advice. AAAI
  6. Knox & Stone (2009). Interactively shaping agents via human reinforcement: the TAMER framework. K-CAP
  7. Warnell et al. (2018). Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces. AAAI
  8. MacGlashan et al. (2017). Interactive Learning from Policy-Dependent Human Feedback. ICML
  9. Christiano et al. (2017). Deep reinforcement learning from human preferences. NIPS
  10. Ibarz et al. (2018). Reward learning from human preferences and demonstrations in Atari. NIPS
  11. Kreutzer et al. (2018). Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning. ACL
zum Seitenanfang