For a time the Eluder dimension has been used to provide bounds for optimistic algorithms in function approximation regimes. We introduce the dissimilarity dimension that provides us with tighter bounds.
This work provides a theoretically sound framework for iteratively exploring the space of perturbations in pooled batches in order to maximize a target phenotype under an experimental budget.
In the problem of model selection the objective is to design ways to select in online fashion the best suitable algorithm to solve a specific problem instance.
In Reinforcement Learning it is standard to assume the reward to be an additive function of per state feedback. In this work we challenge this standard assumption.