Provably Sample Efficient RLHF via Active Preference Optimization