Learning to predict action outcomes in morally conflicting situations is essential for social decision-making but poorly understood. Here we tested which forms of Reinforcement Learning Theory capture how participants learn to choose between self-money and other-shocks, and how they adapt to changes in contingencies. We find choices were better described by a reinforcement learning model based on the current value of separately expected outcomes than by one based on the combined historical values of past outcomes. Participants track expected values of self-money and other-shocks separately, with the substantial individual difference in preference reflected in a valuation parameter balancing their relative weight. This valuation parameter also predicted choices in an independent costly helping task. The expectations of self-money and other-shocks were biased toward the favored outcome but fMRI revealed this bias to be reflected in the ventromedial prefrontal cortex while the pain-observation network represented pain prediction errors independently of individual preferences.