Affective biases are commonly seen in disorders such as depression and anxiety, where individuals may show attention towards and more rapid processing of negative or threatening stimuli. Affective biases have been shown to change with effective intervention: randomized controlled trials into these biases and the mechanisms that underpin them may allow greater understanding of how interventions can be improved and their success be maximized. For trials to be informative, we must have reliable ways of measuring affective bias over time, so we can detect how interventions are changing these biases. In particular, the test-retest reliability of our measures puts an upper bound on our ability to detect effects: thus, in this study, we examine the test-retest reliability of two behavioural tasks that examine affective bias. We recruited 58 individuals in an online study who completed these tasks twice, with at least 14 days in between sessions. We analysed reliability of both summary statistics and parameters from computational models using Pearson’s correlations and intra-class correlations. Standard summary statistic measures from these affective bias tasks had reliability ranging from 0.18 (poor) to 0.49 (moderate). Parameters from computational modelling of these tasks were in many cases less reliable than summary statistics. Embedding the covariance between sessions within the generative modelling framework resulted in higher stability estimates. In sum, measures from these affective bias tasks are moderately reliable, but further work to improve the reliability of these tasks would improve still further our ability to draw inferences in randomized trials.