How to Learn Sequential Decision Making Algorithms from Data.

Abstract

Large transformer models pretrained on diverse data exhibit impressive in-context learning, performing new tasks without explicit training. In this talk, we explore their capabilities in decision-making settings such as bandits and Markov decision processes. We introduce the Decision-Pretrained Transformer (DPT), a simple supervised pretraining approach where a transformer predicts optimal actions from an in-context dataset of past interactions. DPT enables in-context reinforcement learning, including online exploration and offline conservatism, without explicit training for these behaviors. Remarkably, it generalizes beyond the pretraining distribution and adapts to new task structures. Theoretically, DPT approximates Bayesian posterior sampling, yielding provable regret guarantees and faster learning than the algorithms used to generate its training data. Our results highlight a simple yet powerful method for equipping transformers with strong decision-making abilities.

Date
Apr 10, 2025 12:00 AM
Event
KAUST Rising Stars Workshop
Location
King Abdullah University of Science and Technology
Avatar
Aldo Pacchiano
Assistant Professor / Visiting Scientist

My research interests include online learning, Reinforcement Learning, Deep RL and Fairness.