Model Selection for Contextual Bandits and Reinforcement Learning