A Framework for Partially Observed Reward-States in RLHF