For years, adult psychological research has benefitted from web-based data collection. There is growing interest in harnessing this approach to facilitate data collection from children and adolescents to address foundational questions about cognitive development. To date, however, few studies have directly tested whether findings from in-lab developmental psychology tasks can be replicated online, particularly in the domain of value-based learning and decision-making. To address this question, we set up a pipeline for online data collection with children, adolescents,...
Moving Developmental Research Online: Comparing In-Lab and Web-Based Studies of Model-Based Reinforcement Learning
Does insufficient sleep affect how you learn from reward or punishment? Reinforcement learning after 2 nights of sleep restriction
To learn from feedback (trial and error) is essential for all species. Insufficient sleep has been found to reduce the sensitivity to feedback as well as increase reward sensitivity. To determine whether insufficient sleep alters learning from positive and negative feedback, healthy participants (n = 32, mean age 29.0 years, 18 women) were tested once after normal sleep (8 hr time in bed for 2 nights) and once after 2 nights of sleep restriction (4 hr/night) on a probabilistic selection task where learning behaviour was evaluated in three ways: as generalised...
Momentary subjective well-being depends on learning and not reward
Subjective well-being or happiness is often associated with wealth. Recent studies suggest that momentary happiness is associated with reward prediction error, the difference between experienced and predicted reward, a key component of adaptive behaviour. We tested subjects in a reinforcement learning task in which reward size and probability were uncorrelated, allowing us to dissociate between the contributions of reward and learning to happiness. Using computational modelling, we found convergent evidence across stable and volatile learning tasks that happiness, like...
Does posture influence the Stroop effect?
Rosenbaum, Mama, and Algom (2017) reported that participants who completed the Stroop task (i.e., name the hue of a color word when the hue and word meaning are congruent or incongruent) showed a smaller Stroop effect (i.e., the difference in response times between congruent and incongruent trials) when they performed the task standing than when sitting. We report five attempted replications (analyzed sample sizes: N = 108, N = 108, N = 98, N = 78, and N = 51, respectively) of Rosenbaum et al.’s findings, which were conducted in two...
Anxiety modulates preference for immediate rewards among trait-impulsive individuals: A hierarchical Bayesian analysis
Trait impulsivity—defined by strong preference for immediate over delayed rewards and difficulties inhibiting prepotent behaviors—is observed in all externalizing disorders, including substance-use disorders. Many laboratory tasks have been developed to identify decision-making mechanisms and correlates of impulsive behavior, but convergence between task measures and self-reports of impulsivity are consistently low. Long-standing theories of personality and decision-making predict that neurally mediated individual differences in sensitivity to (a) reward cues and (b)...
Reward processing modulates the association between trauma exposure and externalizing psychopathology
Childhood adversity is common and strongly associated with risk for psychopathology. Identifying factors that buffer children from experiencing psychopathology following adversity is critical for developing more effective intervention approaches. The present study examined several behavioral metrics of reward processing reflecting global approach motivation for reward and the degree to which reward responses scaled with reward value (i.e., behavioral sensitivity to reward value) as potential moderators of the association of multiple dimensions of adversity-including...
The reflection effect in memory-based decisions
Previous research has indicated a bias in memory-based decision-making, with people preferring options that they remember better. However, the cognitive mechanisms underlying this memory bias remain elusive. Here, we propose that choosing poorly remembered options is conceptually similar to choosing options with uncertain outcomes. We predicted that the memory bias would be reduced when options had negative subjective value, analogous to the reflection effect, according to which uncertainty aversion is stronger in gains than in losses. In two preregistered experiments...
The rational use of causal inference to guide reinforcement learning strengthens with age
Beliefs about the controllability of positive or negative events in the environment can shape learning throughout the lifespan. Previous research has shown that adults learning is modulated by beliefs about the causal structure of the environment such that they update their value estimates to a lesser extent when the outcomes can be attributed to hidden causes. This study examined whether external causes similarly influenced outcome attributions and learning across development. Ninety participants, ages 7 to 25 years, completed a reinforcement learning task in which...
Differential reinforcement encoding along the hippocampal long axis helps resolve the explore–exploit dilemma
When making decisions, should one exploit known good options or explore potentially better alternatives? Exploration of spatially unstructured options depends on the neocortex, striatum, and amygdala. In natural environments, however, better options often cluster together, forming structured value distributions. The hippocampus binds reward information into allocentric cognitive maps to support navigation and foraging in such spaces. Here we report that human posterior hippocampus (PH) invigorates exploration while anterior hippocampus (AH) supports the transition to...
A dimensional investigation of error-related negativity (ERN) and self-reported psychiatric symptoms
Alterations in error processing are implicated in a range of DSM-defined psychiatric disorders. For instance, obsessive-compulsive disorder (OCD) and generalised anxiety disorder show enhanced electrophysiological responses to errors-i.e. error-related negativity (ERN)-while others like schizophrenia have an attenuated ERN. However, as diagnostic categories in psychiatry are heterogeneous and also highly intercorrelated, the precise mapping of ERN enhancements/impairments is unclear. To address this, we recorded electroencephalograms (EEG) from 196 participants who...
Dopamine promotes instrumental motivation, but reduces reward-related vigour
We can be motivated when reward depends on performance, or merely by the prospect of a guaranteed reward. Performance-dependent (contingent) reward is instrumental, relying on an internal action-outcome model, whereas motivation by guaranteed reward may minimise opportunity cost in reward-rich environments. Competing theories propose that each type of motivation should be dependent on dopaminergic activity. We contrasted these two types of motivation with a rewarded saccade task, in patients with Parkinson’s disease (PD). When PD patients were ON dopamine, they had...
Adolescents exhibit reduced Pavlovian biases on instrumental learning
Multiple learning systems allow individuals to flexibly respond to opportunities and challenges present in the environment. An evolutionarily conserved Pavlovian learning mechanism couples valence and action, promoting a tendency to approach cues associated with reward and to inhibit action in the face of anticipated punishment. Although this default response system may be adaptive, these hard-wired reactions can hinder the ability to learn flexible instrumental actions in pursuit of a goal. Such constraints on behavioral flexibility have been studied extensively in...
Parallel model-based and model-free reinforcement learning for card sorting performance
The Wisconsin Card Sorting Test (WCST) is considered a gold standard for the assessment of cognitive flexibility. On the WCST, repeating a sorting category following negative feedback is typically treated as indicating reduced cognitive flexibility. Therefore such responses are referred to as ‘perseveration’ errors. Recent research suggests that the propensity for perseveration errors is modulated by response demands: They occur less frequently when their commitment repeats the previously executed response. Here, we propose parallel reinforcement-learning models of card...
Cognitive, Affective, and Feedback-Based Flexibility – Disentangling Shared and Different Aspects of Three Facets of Psychological Flexibility
Cognitive flexibility - the ability to adjust one ´s behavior to changing environmental demands - is crucial for controlled behavior. However, the term cognitive flexibility is used heterogeneously, and associations between cognitive flexibility and other facets of flexible behavior have only rarely been studied systematically. To resolve some of these conceptual uncertainties, we directly compared cognitive flexibility (cue-instructed switching between two affectively neutral tasks), affective flexibility (switching between a neutral and an affective task using...
Age differences in risk attitude are shaped by option complexity.
The canonical conclusion from research on age differences in risky choice is that older adults are more risk averse than younger adults, at least in choices involving gains. Most of the evidence for this conclusion derives from studies that used a specific type of choice problem: choices between a safe and a risky option. However, safe and risky options differ not only in the degree of risk but also in the amount of information to be processed-that is, in their complexity. In both an online and a lab experiment, we demonstrate that differences in option complexity can...
Associations between aversive learning processes and transdiagnostic psychiatric symptoms revealed by large-scale phenotyping
Symptom expression in psychiatric conditions is often linked to altered threat perception, however how computational mechanisms that support aversive learning relate to specific psychiatric symptoms remains undetermined. We answer this question using an online game-based aversive learning task together with measures of common psychiatric symptoms in 400 subjects. We show that physiological symptoms of anxiety and a transdiagnostic compulsivity-related factor are associated with enhanced safety learning, as measured using a probabilistic computational model, while trait...
Information about action outcomes differentially affects learning from self-determined versus imposed choices
The valence of new information influences learning rates in humans: good news tends to receive more weight than bad news. We investigated this learning bias in four experiments, by systematically manipulating the source of required action (free versus forced choices), outcome contingencies (low versus high reward) and motor requirements (go versus no-go choices). Analysis of model-estimated learning rates showed that the confirmation bias in learning rates was specific to free choices, but was independent of outcome contingencies. The bias was also unaffected by the...
4 Arm Bandit Task Dataset
The dataset includes 975 participants, who completed an online version of the4-arm bandit task in 2014. All participants gave their consent to carry the experiment. The experiment was approved by UCLResearch Ethics Committee(project 4223/001). The dataset is anonymised, and does not include information about the participants identity. The task followed the 4-arm bandit paradigm described in Daw et al. 2006. In this task the participants were asked to choose between four options on multiple trials. On each trial they had to choose an option and were then given...
Vagus nerve stimulation boosts the drive to work for rewards
Interoceptive feedback transmitted via the vagus nerve plays a vital role in motivation by tuning actions according to physiological needs. Whereas vagus nerve stimulation (VNS) reinforces actions in animals, motivational effects elicited by VNS in humans are still largely elusive. Here, we applied non-invasive transcutaneous auricular VNS (taVNS) on the left or right ear while participants exerted effort to earn rewards using a randomized cross-over design (vs. sham). In line with preclinical studies, acute taVNS enhances invigoration of effort, and stimulation on the...
Attentional priorities drive effects of time pressure on altruistic choice
Dual-process models of altruistic choice assume that automatic responses give way to deliberation over time, and are a popular way to conceptualize how people make generous choices and why those choices might change under time pressure. However, these models have led to conflicting interpretations of behaviour and underlying psychological dynamics. Here, we propose that flexible, goal-directed deployment of attention towards information priorities provides a more parsimonious account of altruistic choice dynamics. We demonstrate that time pressure tends to produce early...
Biased belief updating and suboptimal choice in foraging decisions
Deciding which options to engage, and which to forego, requires developing accurate beliefs about the overall distribution of prospects. Here we adapt a classic prey selection task from foraging theory to examine how individuals keep track of an environments reward rate and adjust choices in response to its fluctuations. Preference shifts were most pronounced when the environment improved compared to when it deteriorated. This is best explained by a trial-by-trial learning model in which participants estimate the reward rate with upward vs. downward changes controlled...
Humans primarily use model-based inference in the two-stage task
Distinct model-free and model-based learning processes are thought to drive both typical and dysfunctional behaviours. Data from two-stage decision tasks have seemingly shown that human behaviour is driven by both processes operating in parallel. However, in this study, we show that more detailed task instructions lead participants to make primarily model-based choices that have little, if any, simple model-free influence. We also demonstrate that behaviour in the two-stage task may falsely appear to be driven by a combination of simple model-free and model-based...
Rationally inattentive intertemporal choice
Discounting of future rewards is traditionally interpreted as evidence for an intrinsic preference in favor of sooner rewards. However, temporal discounting can also arise from internal uncertainty in value representations of future events, if one assumes that noisy mental simulations of the future are rationally combined with prior beliefs. Here, we further develop this idea by considering how simulation noise may be adaptively modulated by task demands, based on principles of rational inattention. We show how the optimal allocation of mental effort can give rise to...
Decisions bias future choices by modifying hippocampal associative memories
Decision-making is guided by memories of option values. However, retrieving items from memory renders them malleable. Here, we show that merely retrieving values from memory and making a choice between options is sufficient both to induce changes to stimulus-reward associations in the hippocampus and to bias future decision-making. After allowing participants to make repeated choices between reward-conditioned stimuli, in the absence of any outcome, we observe that participants prefer stimuli they have previously chosen, and neglect previously unchosen stimuli, over...
Inter-individual differences in resting-state functional connectivity are linked to interval timing in irregular contexts
Behavioral evidence suggests that different mechanisms mediate duration perception depending on whether regular or irregular cues for time estimation are provided, and that individual differences in interoceptive processing may affect duration perception only in the latter case. However, no study has addressed brain correlates of this proposed distinction. Here participants performed a duration reproduction task in two conditions: with unevenly spaced stimuli during time estimation/reproduction (irregular), with regularly spaced stimuli provided during the same...
Asymmetrical learning and memory for acquired gain versus loss associations
Neutral stimuli can acquire value when people learn to associate them with positive or negative outcomes (i.e., gain versus loss associations). Acquired value has been shown to affect how gain and loss associated stimuli are attended, remembered, and acted upon. Here we investigate a potential and previously unreported learning asymmetry in the acquisition of gain and loss associations that may have consequences for subsequent cognitive processing. In our first study, we provide meta-analytic evidence that in probabilistic learning tasks that pair neutral stimuli with...
Replicating patterns of prospect theory for decision under risk
Prospect theory is among the most influential frameworks in behavioural science, specifically in research on decision-making under risk. Kahneman and Tversky’s 1979 study tested financial choices under risk, concluding that such judgements deviate significantly from the assumptions of expected utility theory, which had remarkable impacts on science, policy and industry. Though substantial evidence supports prospect theory, many presumed canonical theories have drawn scrutiny for recent replication failures. In response, we directly test the original methods in a...
Paranoia as a deficit in non-social belief updating
Paranoia is the belief that harm is intended by others. It may arise from selective pressures to infer and avoid social threats, particularly in ambiguous or changing circumstances. We propose that uncertainty may be sufficient to elicit learning differences in paranoid individuals, without social threat. We used reversal learning behavior and computational modeling to estimate belief updating across individuals with and without mental illness, online participants, and rats chronically exposed to methamphetamine, an elicitor of paranoia in humans. Paranoia is associated...
Confidence drives a neural confirmation bias
A prominent source of polarised and entrenched beliefs is confirmation bias, where evidence against one’s position is selectively disregarded. This effect is most starkly evident when opposing parties are highly confident in their decisions. Here we combine human magnetoencephalography (MEG) with behavioural and neural modelling to identify alterations in post-decisional processing that contribute to the phenomenon of confirmation bias. We show that holding high confidence in a decision leads to a striking modulation of post-decision neural processing, such that...
On the convergent validity of risk sensitivity measures
There are a number of well-accepted ways to measure risk sensitivity, with researchers often making conclusions about individual differences based on a single task. Even though long-standing observations suggest that how risky outcomes are presented changes peoples behavior, it is unclear whether risk sensitivity is a unitary trait that can be measured by any one of these instruments. To directly answer this question, we administered three tasks commonly used to elicit risk sensitivity within-subject to a large sample of participants on Amazon Mechanical Turk. Our...
Predictors of risky foraging behaviour in healthy young people
During adolescence and early adulthood, learning when to avoid threats and when to pursue rewards becomes crucial. Using a risky foraging task, we investigated individual differences in this dynamic across 781 individuals aged 14-24 years who were split into a hypothesis-generating discovery sample and a hold-out confirmation sample. Sex was the most important predictor of cautious behaviour and performance. Males earned one standard deviation (or 20%) more reward than females, collected more reward when there was little to lose and reduced foraging to the same level as...
A divisive model of evidence accumulation explains uneven weighting of evidence over time
Divisive normalization has long been used to account for computations in various neural processes and behaviours. The model proposes that inputs into a neural system are divisively normalized by the system’s total activity. More recently, dynamical versions of divisive normalization have been shown to account for how neural activity evolves over time in value-based decision making. Despite its ubiquity, divisive normalization has not been studied in decisions that require evidence to be integrated over time. Such decisions are important when the information is not all...
Anxiety Impedes Adaptive Social Learning Under Uncertainty
Very little is known about how individuals learn under uncertainty when other people are involved. We propose that humans are particularly tuned to social uncertainty, which is especially noisy and ambiguous. Individuals exhibiting less tolerance for uncertainty, such as those with anxiety, may have greater difficulty learning in uncertain social contexts and therefore provide an ideal test population to probe learning dynamics under uncertainty. Using a dynamic trust game and a matched nonsocial task, we found that healthy subjects (n = 257) were particularly good...
Confidence reports in decision-making with multiple alternatives violate the Bayesian confidence hypothesis
Decision confidence reflects our ability to evaluate the quality of decisions and guides subsequent behavior. Experiments on confidence reports have almost exclusively focused on two-alternative decision-making. In this realm, the leading theory is that confidence reflects the probability that a decision is correct (the posterior probability of the chosen option). There is, however, another possibility, namely that people are less confident if the best two options are closer to each other in posterior probability, regardless of how probable they are in absolute terms....
Mouse tracking reveals structure knowledge in the absence of model-based choice
Converging evidence has demonstrated that humans exhibit two distinct strategies when learning in complex environments. One is model-free learning, i.e., simple reinforcement of rewarded actions, and the other is model-based learning, which considers the structure of the environment. Recent work has argued that people exhibit little model-based behavior unless it leads to higher rewards. Here we use mouse tracking to study model-based learning in stochastic and deterministic (pattern-based) environments of varying difficulty. In both tasks participants mouse movements...
A delay in sampling information from temporally autocorrelated visual stimuli
Much of our world changes smoothly in time, yet the allocation of attention is typically studied with sudden changes - transients. A sizeable lag in selecting feature information is seen when stimuli change smoothly. Yet this lag is not seen with temporally uncorrelated rapid serial visual presentation (RSVP) stimuli. This suggests that temporal autocorrelation of a feature paradoxically increases the latency at which information is sampled. To test this, participants are asked to report the color of a disk when a cue was presented. There is an increase in selection...
Generalizing to generalize: Humans flexibly switch between compositional and conjunctive structures during reinforcement learning
Humans routinely face novel environments in which they have to generalize in order to act adaptively. However, doing so involves the non-trivial challenge of deciding which aspects of a task domain to generalize. While it is sometimes appropriate to simply re-use a learned behavior, often adaptive generalization entails recombining distinct components of knowledge acquired across multiple contexts. Theoretical work has suggested a computational trade-off in which it can be more or less useful to learn and generalize aspects of task structure jointly or compositionally,...
Confidence controls perceptual evidence accumulation
Perceptual decisions are accompanied by feelings of confidence that reflect the likelihood that the decision was correct. Here we aim to clarify the relationship between perception and confidence by studying the same perceptual task across three different confidence contexts. Human observers were asked to categorize the source of sequentially presented visual stimuli. Each additional stimulus provided evidence for making more accurate perceptual decisions, and better confidence judgements. We show that observers’ ability to set appropriate evidence accumulation bounds...
Intermittent Absence of Control during Reinforcement Learning Interferes with Pavlovian Bias in Action Selection
The ability to control the occurrence of rewarding and punishing events is crucial for our well-being. Two ways to optimize performance are to follow heuristics like Pavlovian biases to approach reward and avoid loss or to rely more on slowly accumulated stimulus-action associations. Although reduced control over outcomes has been linked to suboptimal decision-making in clinical conditions associated with learned helplessness, it is unclear how uncontrollability of the environment is related to the arbitration between different response strategies. This study directly...
Temporal discounting correlates with directed exploration but not with random exploration
The explore-exploit dilemma describes the trade off that occurs any time we must choose between exploring unknown options and exploiting options we know well. Implicit in this trade off is how we value future rewards - exploiting is usually better in the short term, but in the longer term the benefits of exploration can be huge. Thus, in theory there should be a tight connection between how much people value future rewards, i.e. how much they discount future rewards relative to immediate rewards, and how likely they are to explore, with less ‘temporal discounting’...
Experimentally-induced and real-world acute anxiety have no effect on goal-directed behaviour
Goal-directed control guides optimal decision-making and it is an important cognitive faculty that protects against developing habits. Previous studies have found some evidence of goal-directed deficits when healthy individuals are stressed, and in psychiatric conditions characterised by compulsive behaviours and anxiety. Here, we tested if goal-directed control is affected by state anxiety, which might explain the former results. We carried out a causal test of this hypothesis in two experiments (between-subject N = 88; within-subject N = 50) that used the...
Doubting what you already know: Uncertainty regarding state transitions is associated with obsessive compulsive symptoms
Obsessive compulsive (OC) symptoms involve excessive information gathering (e.g., checking, reassurance-seeking), and uncertainty about possible, often catastrophic, future events. Here we propose that these phenomena are the result of excessive uncertainty regarding state transitions (transition uncertainty): a computational impairment in Bayesian inference leading to a reduced ability to use the past to predict the present and future, and to oversensitivity to feedback (i.e. prediction errors). Using a computational model of Bayesian learning under uncertainty in a...
The Confidence Database
Understanding how people rate their confidence is critical for the characterization of a wide range of perceptual, memory, motor and cognitive processes. To enable the continued exploration of these processes, we created a large database of confidence studies spanning a broad set of paradigms, participant populations and fields of study. The data from each study are structured in a common, easy-to-use format that can be easily imported and analysed using multiple software packages. Each dataset is accompanied by an explanation regarding the nature of the collected data....
Negative errors in time reproduction tasks
In time reproduction tasks, the reaction time of motor responses is intrinsically linked to the measure of perceptual timing. Decisions are based on a continuous comparison between elapsed time and a memory trace of the to-be-reproduced interval. Here, we investigate the possibility that negative reproduction errors can be explained by the tendency to prefer earlier over later response times, or whether the whole range of possible response times is shifted. In experiment 1, we directly compared point reproduction (participants indicate the exact time point of equality)...
Uncertainty in learning, choice, and visual fixation
Uncertainty plays a critical role in reinforcement learning and decision making. However, exactly how it influences behavior remains unclear. Multiarmed-bandit tasks offer an ideal test bed, since computational tools such as approximate Kalman filters can closely characterize the interplay between trial-by-trial values, uncertainty, learning, and choice. To gain additional insight into learning and choice processes, we obtained data from subjects overt allocation of gaze. The estimated value and estimation uncertainty of options influenced what subjects looked at before...
Ventromedial prefrontal cortex compression during concept learning
Prefrontal cortex (PFC) is thought to support the ability to focus on goal-relevant information by filtering out irrelevant information, a process akin to dimensionality reduction. Here, we test this dimensionality reduction hypothesis by relating a data-driven approach to characterizing the complexity of neural representation with a theoretically-supported computational model of learning. We find evidence of goal-directed dimensionality reduction within human ventromedial PFC during learning. Importantly, by using computational predictions of each participant’s...
Behind closed doors: The role of depressed affect on risky choices under time pressure
Previous research suggests that depressive symptoms are associated with altered sensitivity to reward and punishment in various decision-making contexts. Building on this work, this study investigated whether depressed-affect symptoms influenced risky decision making under time pressure. The effect of depressed affect on risky choice was assessed in a reward (Experiments 1A and 1B) and loss (Experiment 2) context under low- and high-pressure conditions. Decisions involved learning to choose between a “sure” option and a “risky” option with identical expected values. In...
Aberrant cost–benefit integration during effort-based decision making relates to severity of substance use disorders
Aberrant cost–benefit decision making is a key factor related to individual differences in the expression of substance use disorders (SUDs). Previous research highlights how delay-cost sensitivity affects variability in SUDs; however, other forms of cost–benefit decision making—effort-based choice—have received less attention. We administered the Effort Expenditure for Rewards Task (EEfRT) in an SUD-enriched community sample (N = 80). Individuals with more severe SUDs were less likely to use information about expected value when deciding between high-effort,...
Controllability governs the balance between Pavlovian and instrumental action selection
A Pavlovian bias to approach reward-predictive cues and avoid punishment-predictive cues can conflict with instrumentally-optimal actions. Here, we propose that the brain arbitrates between Pavlovian and instrumental control by inferring which is a better predictor of reward. The instrumental predictor is more flexible; it can learn values that depend on both stimuli and actions, whereas the Pavlovian predictor learns values that depend only on stimuli. The arbitration theory predicts that the Pavlovian predictor will be favored when rewards are relatively...
Task complexity interacts with state-space uncertainty in the arbitration between model-based and model-free learning
It has previously been shown that the relative reliability of model-based and model-free reinforcement-learning (RL) systems plays a role in the allocation of behavioral control between them. However, the role of task complexity in the arbitration between these two strategies remains largely unknown. Here, using a combination of novel task design, computational modelling, and model-based fMRI analysis, we examined the role of task complexity alongside state-space uncertainty in the arbitration process. Participants tended to increase model-based RL control in response...