This work provides a theoretically sound framework for iteratively exploring the space of perturbations in pooled batches in order to maximize a target phenotype under an experimental budget.
In the problem of model selection the objective is to design ways to select in online fashion the best suitable algorithm to solve a specific problem instance.
In Reinforcement Learning it is standard to assume the reward to be an additive function of per state feedback. In this work we challenge this standard assumption.