Language Model Personalization via Reward Factorization