Bandits

Autonomous Discovery from Data

Sequential decision making algorithms are history dependent policies. Modern sequence prediction models such as transformer architectures have made it feasible to successfully represent these objects in compact architectures. In these research works we explore how to meta-train these models from data to encode known and new sequential decision making strategies.

The Dissimilarity Dimension

For a time the Eluder dimension has been used to provide bounds for optimistic algorithms in function approximation regimes. We introduce the dissimilarity dimension that provides us with tighter bounds.

Neural Optimism for Genetic Perturbation Experiments

This work provides a theoretically sound framework for iteratively exploring the space of perturbations in pooled batches in order to maximize a target phenotype under an experimental budget.

Model Selection for Contextual Bandits and Reinforcement Learning

In the problem of model selection the objective is to design ways to select in online fashion the best suitable algorithm to solve a specific problem instance.