Representative design refers to the idea that experimental stimuli should be sampled or designed such that they represent the environments to which measured constructs are supposed to generalize. In this article we investigate the role of representative design in achieving valid and reliable psychological assessments, by focusing on a widely used behavioral measure of risk taking-the Balloon Analogue Risk Task (BART). Specifically, we demonstrate that the typical implementation of this task violates the principle of representative design, thus conflicting with the expectations people likely form from real balloons. This observation may provide an explanation for the previously observed limitations in some of the BARTs psychometric properties (e.g., convergent validity with other measures of risk taking). To experimentally test the effects of improved representative designs, we conducted two extensive empirical studies (N = 772 and N = 632), finding that participants acquired more accurate beliefs about the optimal behavior in the BART because of these task adaptions. Yet, improving the tasks representativeness proved to be insufficient to enhance the BARTs psychometric properties. It follows that for the development of valid behavioral measurement instruments-as are needed, for instance, in functional neuroimaging studies-our field has to overcome the philosophy of the repair program (i.e., fixing existing tasks). Instead, we suggest that the development of valid task designs requires novel ecological assessments, aimed at identifying those real-life behaviors and associated psychological processes that lab tasks are supposed to capture and generalize to.