Standard, well-established cognitive tasks that produce reliable effects in group comparisons also lead to unreliable measurement when assessing individual differences. This “reliability paradox” has been demonstrated in decision-conflict tasks such as the Simon, Flanker, and Stroop tasks, which measure various aspects of cognitive control. We aimed to address this paradox by implementing carefully calibrated versions of the standard tests with an additional manipulation to encourage processing of conflicting information, as well as combinations of standard tasks. A series of experiments concluded that a Flanker task and a combined Simon and Stroop task with the additional manipulation produced reliable estimates of individual differences in under 100 trials per task, which markedly improves on the reliability seen in benchmark Flanker, Simon, and Stroop data. We make the new tasks freely available and discuss both theoretical and applied implications regarding how the cognitive testing of individual differences is carried out.