The affordances task serves as an important tool for the assessment of cognition and visuomotor functioning, and yet its test-retest reliability has not been established. In the affordances task, participants attend to a goal-directed task (e.g., classifying manipulable objects such as cups and pots) while suppressing their stimulus-driven, irrelevant reactions afforded by these objects (e.g., grasping their handles). This results in cognitive conflicts manifesting at the task level and the response level. In the current study, we assessed the reliability of the affordances task for the first time. While doing so, we referred to the “reliability paradox,” according to which behavioral tasks that produce highly replicable group-level effects often yield low test-retest reliability due to the inadequacy of traditional correlation methods in capturing individual differences between participants. Alongside the simple test-retest correlations, we employed a Bayesian generative model that was recently demonstrated to result in a more precise estimation of test-retest reliability. Two hundred and ninety-five participants completed an online version of the affordances task twice, with a one-week gap. Performance on the online version replicated results obtained under in-lab administrations of the task. While the simple correlation method resulted in weak test-retest measures of the different effects, the generative model yielded a good reliability assessment. The current results support the utility of the affordances task as a reliable behavioral tool for the assessment of group-level and individual differences in cognitive and visuomotor functioning. The results further support the employment of generative modeling in the study of individual differences.