Search

Aldo Pacchiano

Michael Jordan

Parallelizing Contextual Bandits
An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit
Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback
Parallelizing Contextual Linear Bandits
Tactical Optimism and Pessimism for Deep Reinforcement Learning
Accelerated Message Passing for Entropy-Regularized MAP Inference
Learning to Score Behaviors for Guided Policy Optimization
On Approximate Thompson Sampling with Langevin Algorithms
Convergence Rates of Smooth Message Passing with Rounding in Entropy-Regularized MAP Inference
Robustness Guarantees for Mode Estimation with an Application to Bandits
Gen-Oja: A Two-time-scale approach for Streaming CCA