Sequential decision making algorithms are history dependent policies. Modern sequence prediction models such as transformer architectures have made it feasible to successfully represent these objects in compact architectures. In these research works we explore how to meta-train these models from data to encode known and new sequential decision making strategies.
For a time the Eluder dimension has been used to provide bounds for optimistic algorithms in function approximation regimes. We introduce the dissimilarity dimension that provides us with tighter bounds.
This work provides a theoretically sound framework for iteratively exploring the space of perturbations in pooled batches in order to maximize a target phenotype under an experimental budget.
In the problem of model selection the objective is to design ways to select in online fashion the best suitable algorithm to solve a specific problem instance.