Human Reinforcement Learning: Algorithms and Hands-on Practice
Seminarplan
Reading list
- Kreutzer et al. (2017). Bandit Structured Prediction for Neural Sequence-to-Sequence Learning. ACL
- Nguyen et al. (2017). Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback. EMNLP
- Swaminathan & Joachims (2015). Counterfactual Risk Minimization: Learning from Logged Bandit Feedback. ICML
- Lawrence et al. (2017). Counterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation. EMNLP
- Judah et al. (2010). Reinforcement Learning via Practice and Critique Advice. AAAI
- Knox & Stone (2009). Interactively shaping agents via human reinforcement: the TAMER framework. K-CAP
- Warnell et al. (2018). Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces. AAAI
- MacGlashan et al. (2017). Interactive Learning from Policy-Dependent Human Feedback. ICML
- Christiano et al. (2017). Deep reinforcement learning from human preferences. NIPS
- Ibarz et al. (2018). Reward learning from human preferences and demonstrations in Atari. NIPS
- Kreutzer et al. (2018). Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning. ACL