Aldo Pacchiano | Aldo Pacchiano

Latest

In-Context Learning for Pure Exploration
Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms
Post-training Large Language Models for Diverse High-Quality Responses
The Good, the Bad, and the Sampled: a No-Regret Approach to Safe Online Classification
Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward
Language Model Personalization via Reward Factorization
Contextual Bandits with Stage-wise Constraints
On the Hardness of Bandit Learning
Multiple-policy Evaluation via Density Estimation
Adaptive Exploration for Multi-Reward Multi-Policy Evaluation
Feasible Action Search for Bandit Linear Programs via Thompson Sampling
Pure Exploration with Feedback Graphs
Second Order Bounds for Contextual Bandits with Function Approximation
ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
A Theoretical Framework for Partially Observed Reward-States in RLHF
State-free Reinforcement Learning
Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary Objectives
Provable Interactive Learning with Hindsight Instruction Feedback
Provably Sample Efficient RLHF via Active Preference Optimization
Data-Driven Regret Balancing for Online Model Selection in Bandits
Improving Offline RL by Blending Heuristics
Experiment Planning with Function Approximation
Anytime Model Selection in Linear Bandits
Supervised Pretraining Can Learn In-Context Reinforcement Learning
Transfer RL via the Undo Maps Formalism
A Unified Model and Dimension for Interactive Estimation
Leveraging Offline Data in Online Reinforcement Learning
Estimating Optimal Policy Value in General Linear Contextual Bandits
Parallelizing Contextual Bandits
Dueling RL: Reinforcement Learning with Trajectory Preferences
Neural Design for Genetic Perturbation Experiments
An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit
Learning General World Models in a Handful of Reward-Free Deployments
Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity
Best of Both Worlds Model Selection
Joint Representation Training in Sequential Tasks with Shared Structure
Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback
Meta Learning MDPs with Linear Transition Models
ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution
Neural Pseudo-Label Optimism for the Bank Loan Problem
Towards an Understanding of Default Policies in Multitask Policy Optimization
Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection
Unlocking Pixels for Reinforcement Learning via Implicit Attention
Towards Tractable Optimism in Model-Based Reinforcement Learning
Dynamic Balancing for Model Selection in Bandits and RL
Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity
Model Selection for Contextual Bandits and Reinforcement Learning
On the Theory of Reinforcement Learning with Once-per-Episode Feedback
Parallelizing Contextual Linear Bandits
Learning the Truth From Only One Side of the Story
Stochastic Bandits with Linear Constraints
Near Optimal Policy Optimization via REPS
Tactical Optimism and Pessimism for Deep Reinforcement Learning
Fairness with Continuous Optimal Transport
Effective Diversity in Population-Based Reinforcement Learning
Model Selection in Contextual Stochastic Bandit Problems
Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian
Accelerated Message Passing for Entropy-Regularized MAP Inference
Learning to Score Behaviors for Guided Policy Optimization
On Approximate Thompson Sampling with Langevin Algorithms
Ready Policy one: World Building Through Active Learning
Stochastic Flows and Geometric Optimization on the Orthogonal Group
Online Model Selection for Reinforcement Learning with Function Approximation
Regret Bound Balancing and Elimination for Model Selection in Bandits and RL
Wasserstein Fair Classification
Taming the Herd: Multi-Modal Meta-Learning with a Population of Agents
Regret Balancing for Bandit and RL Model Selection
Convergence Rates of Smooth Message Passing with Rounding in Entropy-Regularized MAP Inference
Practical Nonisotropic Monte Carlo Sampling in High Dimensions via Determinantal Point Processes
Provably Robust Blackbox Optimization for Reinforcement Learning
Robustness Guarantees for Mode Estimation with an Application to Bandits
From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization
ES-MAML: Simple Hessian-Free Meta Learning
Reinforcement Learning with Chromatic Networks for Compact Architecture Search
Computing Stable Solutions in Threshold Network Flow Games With Bounded Treewidth
KAMA-NNs: Low-Dimensional Rotation Based Neural Networks
Gen-Oja: A Two-time-scale approach for Streaming CCA
Online learning with kernel losses
Reinforcement Learning with Wasserstein Distance Regularisation, with Applications to Multipolicy Learning
Conditions Beyond Treewidth for Tightness of Higher-order LP Relaxations
Real Time Clustering of Time Series Using Triangular Potentials
Computational Approaches to Poisson Traces Associated to Finite Subgroups of Sp2n(C)
A General Approach to Fairness with Optimal Transport
Geometrically Coupled Monte Carlo Sampling
Trace Reconstruction Problem