Behavioral data, despite being a common index of cognitive activity, is under scrutiny for having poor reliability as a result of noise or lacking replications of reliable effects. Here, we argue that cognitive modeling can be used to enhance the test-retest reliability of the behavioral measures by recovering individual-level parameters from behavioral data. We tested this empirically with the Probabilistic Stimulus Selection (PSS) task, which is used to measure a participants sensitivity to positive or negative reinforcement. An analysis of 400,000 simulations from an Adaptive Control of Thought-Rational (ACT-R) model of this task showed that the poor reliability of the task is due to the instability of the end-estimates: because of the way the task works, the same participants might sometimes end up having apparently opposite scores. To recover the underlying interpretable parameters and enhance reliability, we used a Bayesian Maximum A Posteriori (MAP) procedure. We were able to obtain reliable parameters across sessions (intraclass correlation coefficient ≈ 0.5). A follow-up study on a modified version of the task also found the same pattern of results, with very poor test-retest reliability in behavior but moderate reliability in recovered parameters (intraclass correlation coefficient ≈ 0.4). Collectively, these results imply that this approach can further be used to provide superior measures in terms of reliability, and bring greater insights into individual differences.