The extent to which subjective awareness influences reward processing, and thereby affects future decisions, is currently largely unknown. In the present report, we investigated this question in a reinforcement learning framework, combining perceptual masking, computational modeling, and electroencephalographic recordings (human male and female participants). Our results indicate that degrading the visibility of the reward decreased, without completely obliterating, the ability of participants to learn from outcomes, but concurrently increased their tendency to repeat...
How the Level of Reward Awareness Changes the Computational and Electrophysiological Signatures of Reinforcement Learning
Retroactive and graded prioritization of memory by reward
Many decisions are based on an internal model of the world. Yet, how such a model is constructed from experience and represented in memory remains unknown. We test the hypothesis that reward shapes memory for sequences of events by retroactively prioritizing memory for objects as a function of their distance from reward. Human participants encountered neutral objects while exploring a series of mazes for reward. Across six data sets, we find that reward systematically modulates memory for neutral objects, retroactively prioritizing memory for objects closest to the...
Neural mechanisms for learning self and other ownership
Sense of ownership is a ubiquitous and fundamental aspect of human cognition. Here we used model-based functional magnetic resonance imaging and a novel minimal ownership paradigm to probe the behavioural and neural mechanisms underpinning ownership acquisition for ourselves, friends and strangers. We find a self-ownership bias at multiple levels of behaviour from initial preferences to reaction times and computational learning rates. Ventromedial prefrontal cortex (vmPFC) and anterior cingulate sulcus (ACCs) responded more to self vs. stranger associations, but despite...
Generalization guides human exploration in vast decision spaces
From foraging for food to learning complex games, many aspects of human behaviour can be framed as a search problem with a vast space of possible actions. Under finite search horizons, optimal solutions are generally unobtainable. Yet, how do humans navigate vast problem spaces, which require intelligent exploration of unobserved actions? Using various bandit tasks with up to 121 arms, we study how humans search for rewards under limited search horizons, in which the spatial correlation of rewards (in both generated and natural environments) provides traction for...
Magnitude and incentives: revisiting the overweighting of extreme events in risky decisions from experience
Recent experimental evidence in experience-based decision-making suggests that people are more risk seeking in the gains domain relative to the losses domain. This critical result is at odds with the standard reflection effect observed in description-based choice and explained by Prospect Theory. The so-called reversed-reflection effect has been predicated on the extreme-outcome rule, which suggests that memory biases affect risky choice from experience. To test the general plausibility of the rule, we conducted two experiments examining how the magnitude of prospective...
Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences
In economics and perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual information pertaining to decision states is integrated in learning algorithms has received comparably little attention. Here, we investigate reinforcement learning behavior and its computational substrates in a task where we orthogonally manipulate outcome valence and magnitude, resulting in systematic variations in state-values. Model...
Frontal network dynamics reflect neurocomputational mechanisms for reducing maladaptive biases in motivated action
Motivation exerts control over behavior by eliciting Pavlovian responses, which can either match or conflict with instrumental action. We can overcome maladaptive motivational influences putatively through frontal cognitive control. However, the neurocomputational mechanisms subserving this control are unclear; does control entail up-regulating instrumental systems, down-regulating Pavlovian systems, or both? We combined electroencephalography (EEG) recordings with a motivational Go/NoGo learning task (N = 34), in which multiple Go options enabled us to...
The Tortoise and the Hare: Interactions between Reinforcement Learning and Working Memory
Learning to make rewarding choices in response to stimuli depends on a slow but steady process, reinforcement learning, and a fast and flexible, but capacity-limited process, working memory. Using both systems in parallel, with their contributions weighted based on performance, should allow us to leverage the best of each system: rapid early learning, supplemented by long-term robust acquisition. However, this assumes that using one process does not interfere with the other. We use computational modeling to investigate the interactions between the two processes in a...
Planning complexity registers as a cost in metacontrol
Decision-making algorithms face a basic tradeoff between accuracy and effort (i.e., computational demands). It is widely agreed that humans can choose between multiple decision-making processes that embody different solutions to this tradeoff: Some are computationally cheap but inaccurate, whereas others are computationally expensive but accurate. Recent progress in understanding this tradeoff has been catalyzed by formalizing it in terms of model-free (i.e., habitual) versus model-based (i.e., planning) approaches to reinforcement learning. Intuitively, if two tasks...
Pavlovian Control of Escape and Avoidance
To survive in complex environments, animals need to have mechanisms to select effective actions quickly, with minimal computational costs. As perhaps the computationally most parsimonious of these systems, Pavlovian control accomplishes this by hardwiring specific stereotyped responses to certain classes of stimuli. It is well documented that appetitive cues initiate a Pavlovian bias toward vigorous approach; however, Pavlovian responses to aversive stimuli are less well understood. Gaining a deeper understanding of aversive Pavlovian responses, such as active...
Reward learning over weeks versus minutes increases the neural representation of value in the human brain
Over the past few decades, neuroscience research has illuminated the neural mechanisms supporting learning from reward feedback. Learning paradigms are increasingly being extended to study mood and psychiatric disorders as well as addiction. However, one potentially critical characteristic that this research ignores is the effect of time on learning: human feedback learning paradigms are usually conducted in a single rapidly paced session, whereas learning experiences in ecologically relevant circumstances and in animal research are almost always separated by longer...
Aversion to option loss in a restless bandit task
In everyday life, people need to make choices without full information about the environment, which poses an explore-exploit dilemma in which one must balance the need to learn about the world and the need to obtain rewards from it. The explore-exploit dilemma is often studied using the multi-armed restless bandit task, in which people repeatedly select from multiple options, and human behaviour is modelled as a form of reinforcement learning via Kalman filters. Inspired by work in the judgment and decision-making literature, we present two experiments using multi-armed...
The effect of attention and working memory on the estimation of elapsed time
Psychological models of time perception involve attention and memory: while attention typically regulates the flow of events, memory maintains timed events or intervals. The precise, and possibly distinct, roles of attention and memory in time perception remain debated. In this behavioral study, we tested 48 participants in a prospective duration estimation task while they fully attended to time or performed a working memory (WM) task. We report that paying attention to time lengthened perceived duration in the range of seconds to minutes, whereas diverting attention...
Dissociable effects of surprising rewards on learning and memory
Reward-prediction errors track the extent to which rewards deviate from expectations, and aid in learning. How do such errors in prediction interact with memory for the rewarding episode? Existing findings point to both cooperative and competitive interactions between learning and memory mechanisms. Here, we investigated whether learning about rewards in a high-risk context, with frequent, large prediction errors, would give rise to higher fidelity memory traces for rewarding events than learning in a low-risk context. Experiment 1 showed that recognition was better for...
Individual differences in first- and second-order temporal judgment
The ability of subjects to identify and reproduce brief temporal intervals is influenced by many factors whether they be stimulus-based, task-based or subject-based. The current study examines the role individual differences play in subsecond and suprasecond timing judgments, using the schizoptypy personality scale as a test-case approach for quantifying a broad range of individual differences. In two experiments, 129 (Experiment 1) and 141 (Experiment 2) subjects completed the O-LIFE personality questionnaire prior to performing a modified temporal-bisection task. In...
Prospect theory reflects selective allocation of attention.
There is a disconnect in the literature between analyses of risky choice based on cumulative prospect theory (CPT) and work on predecisional information processing. One likely reason is that for expectation models (e.g., CPT), it is often assumed that people behaved only as if they conducted the computations leading to the predicted choice and that the models are thus mute regarding information processing. We suggest that key psychological constructs in CPT, such as loss aversion and outcome and probability sensitivity, can be interpreted in terms of attention...
Forecasting the outcome of a time-varying Bernoulli process: Data from a laboratory experiment
The data presented in this article are related to the research article entitled “Discrete Adjustment to a Changing Environment: Experimental Evidence” (Khaw et al., 2017) [1]. We present data from a laboratory experiment that asks subjects to forecast the outcome of a time-varying Bernoulli process. On a computer program, subjects draw rings with replacement from a virtual box containing green and red rings in an unknown proportion. Subjects provide their estimates of the probability of drawing a green ring. They are rewarded for their participation and for the accuracy...
Prior preferences beneficially influence social and non-social learning
Our personal preferences affect a broad array of social behaviors. This includes the way we learn the preferences of others, an ability that often relies on limited or ambiguous information. Here we report an egocentric influence on this type of social learning that is reflected in both performance and response times. Using computational models that combine inter-trial learning and intra-trial choice, we find transient effects of participants preferences on the learning process, through the influence of priors, and persistent effects on the choice process. A second...
Risk preference shares the psychometric structure of major psychological traits
To what extent is there a general factor of risk preference, R, akin to g, the general factor of intelligence? Can risk preference be regarded as a stable psychological trait? These conceptual issues persist because few attempts have been made to integrate multiple risk-taking measures, particularly measures from different and largely unrelated measurement traditions (self-reported propensity measures assessing stated preferences, incentivized behavioral measures eliciting revealed preferences, and frequency measures assessing actual risky activities). Adopting a...
A causal role for right frontopolar cortex in directed, but not random, exploration
The explore-exploit dilemma occurs anytime we must choose between exploring unknown options for information and exploiting known resources for reward. Previous work suggests that people use two different strategies to solve the explore-exploit dilemma: directed exploration, driven by information seeking, and random exploration, driven by decision noise. Here, we show that these two strategies rely on different neural systems. Using transcranial magnetic stimulation to inhibit the right frontopolar cortex, we were able to selectively inhibit directed exploration while...
Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing
Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We...
Learning relative values in the striatum induces violations of normative decision making
To decide optimally between available options, organisms need to learn the values associated with these options. Reinforcement learning models offer a powerful explanation of how these values are learnt from experience. However, human choices often violate normative principles. We suggest that seemingly counterintuitive decisions may arise as a natural consequence of the learning mechanisms deployed by humans. Here, using fMRI and a novel behavioural task, we show that, when suddenly switched to novel choice contexts, participants choices are incongruent with values...
Hyper-responsivity to losses in the anterior insula during economic choice scales with depression severity
Commonly observed distortions in decision-making among patients with major depressive disorder (MDD) may emerge from impaired reward processing and cognitive biases toward negative events. There is substantial theoretical support for the hypothesis that MDD patients overweight potential losses compared with gains, though the neurobiological underpinnings of this bias are uncertain. Twenty-one unmedicated patients with MDD were compared with 25 healthy controls (HC) using functional magnetic resonance imaging (fMRI) together with an economic decision-making task over...
Cognitive components underpinning the development of model-based learning
Reinforcement learning theory distinguishes “model-free” learning, which fosters reflexive repetition of previously rewarded actions, from “model-based” learning, which recruits a mental model of the environment to flexibly select goal-directed actions. Whereas model-free learning is evident across development, recruitment of model-based learning appears to increase with age. However, the cognitive processes underlying the development of model-based learning remain poorly characterized. Here, we examined whether age-related differences in cognitive processes underlying...
Catecholaminergic challenge uncovers distinct Pavlovian and instrumental mechanisms of motivated (in)action
Catecholamines modulate the impact of motivational cues on action. Such motivational biases have been proposed to reflect cue-based, Pavlovian effects. Here, we assess whether motivational biases may also arise from asymmetrical instrumental learning of active and passive responses following reward and punishment outcomes. We present a novel paradigm, allowing us to disentangle the impact of reward and punishment on instrumental learning from Pavlovian response biasing. Computational analyses showed that motivational biases reflect both Pavlovian and instrumental...
The effect of atomoxetine on random and directed exploration in humans
The adaptive regulation of the trade-off between pursuing a known reward (exploitation) and sampling lesser-known options in search of something better (exploration) is critical for optimal performance. Theory and recent empirical work suggest that humans use at least two strategies for solving this dilemma: a directed strategy in which choices are explicitly biased toward information seeking, and a random strategy in which decision noise leads to exploration by chance. Here we examined the hypothesis that random exploration is governed by the neuromodulatory locus...
Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems
Human behavior is sometimes determined by habit and other times by goal-directed planning. Modern reinforcement-learning theories formalize this distinction as a competition between a computationally cheap but inaccurate model-free system that gives rise to habits and a computationally expensive but accurate model-based system that implements planning. It is unclear, however, how people choose to allocate control between these systems. Here, we propose that arbitration occurs by comparing each systems task-specific costs and benefits. To investigate this proposal, we...
Cognitive states influence dopamine-driven aberrant learning in Parkinson's disease
Individual differences in dopaminergic tone underlie tendencies to learn from reward versus punishment. These effects are well documented in Parkinsons patients, who vacillate between low and high tonic dopaminergic states as a function of medication. Yet very few studies have investigated the influence of higher-level cognitive states known to affect downstream dopaminergic learning in Parkinsons patients. A dopamine-dependent cognitive influence over learning would provide a candidate mechanism for declining cognitive integrity and motivation in Parkinsons patients....
Behavioural and neural characterization of optimistic reinforcement learning
When forming and updating beliefs about future life outcomes, people tend to consider good news and to disregard bad news. This tendency is assumed to support the optimism bias. Whether this learning bias is specific to ‘high-level’ abstract belief update or a particular expression of a more general ‘low-level’ reinforcement learning process is unknown. Here we report evidence in favour of the second hypothesis. In a simple instrumental learning task, participants incorporated better-than-expected outcomes at a higher rate than worse-than-expected ones. In addition,...
Pupil-linked arousal is driven by decision uncertainty and alters serial choice bias
While judging their sensory environments, decision-makers seem to use the uncertainty about their choices to guide adjustments of their subsequent behaviour. One possible source of these behavioural adjustments is arousal: decision uncertainty might drive the brains arousal systems, which control global brain state and might thereby shape subsequent decision-making. Here, we measure pupil diameter, a proxy for central arousal state, in human observers performing a perceptual choice task of varying difficulty. Pupil dilation, after choice but before external feedback,...
From anomalies to forecasts: Toward a descriptive model of decisions under risk, under ambiguity, and from experience.
Experimental studies of choice behavior document distinct, and sometimes contradictory, deviations from maximization. For example, people tend to overweight rare events in 1-shot decisions under risk, and to exhibit the opposite bias when they rely on past experience. The common explanations of these results assume that the contradicting anomalies reflect situation-specific processes that involve the weighting of subjective values and the use of simple heuristics. The current article analyzes 14 choice anomalies that have been described by different models, including...
Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning
Serious and debilitating symptoms of anxiety are the most common mental health problem worldwide, accounting for around 5% of all adult years lived with disability in the developed world. Avoidance behavior-avoiding social situations for fear of embarrassment, for instance-is a core feature of such anxiety. However, as for many other psychiatric symptoms the biological mechanisms underlying avoidance remain unclear. Reinforcement learning models provide formal and testable characterizations of the mechanisms of decision making; here, we examine avoidance in these terms....
Who Dares, Who Errs? Disentangling Cognitive and Motivational Roots of Age Differences in Decisions Under Risk
We separate for the first time the roles of cognitive and motivational factors in shaping age differences in decision making under risk. Younger and older adults completed gain, loss, and mixed-domain choice problems as well as measures of cognitive functioning and affect. The older adults decision quality was lower than the younger adults in the loss domain, and this age difference was attributable to the older adults lower cognitive abilities. In addition, the older adults chose the more risky option more often than the younger adults in the gain and mixed domains;...
Placebo Intervention Enhances Reward Learning in Healthy Individuals
According to the placebo-reward hypothesis, placebo is a reward-anticipation process that increases midbrain dopamine (DA) levels. Reward-based learning processes, such as reinforcement learning, involves a large part of the DA-ergic network that is also activated by the placebo intervention. Given the neurochemical overlap between placebo and reward learning, we investigated whether verbal instructions in conjunction with a placebo intervention are capable of enhancing reward learning in healthy individuals by using a monetary reward-based reinforcement-learning task....
The Computational Development of Reinforcement Learning during Adolescence
Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants...
From Creatures of Habit to Goal-Directed Learners: Tracking the Developmental Emergence of Model-Based Reinforcement Learning
Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative...
Reduction of Pavlovian Bias in Schizophrenia: Enhanced Effects in Clozapine-Administered Patients
The negative symptoms of schizophrenia (SZ) are associated with a pattern of reinforcement learning (RL) deficits likely related to degraded representations of reward values. However, the RL tasks used to date have required active responses to both reward and punishing stimuli. Pavlovian biases have been shown to affect performance on these tasks through invigoration of action to reward and inhibition of action to punishment, and may be partially responsible for the effects found in patients. Forty-five patients with schizophrenia and 30 demographically-matched controls...
Characterizing a psychiatric symptom dimension related to deficits in goal-directed control
Prominent theories suggest that compulsive behaviors, characteristic of obsessive-compulsive disorder and addiction, are driven by shared deficits in goal-directed control, which confers vulnerability for developing rigid habits. However, recent studies have shown that deficient goal-directed control accompanies several disorders, including those without an obvious compulsive element. Reasoning that this lack of clinical specificity might reflect broader issues with psychiatric diagnostic categories, we investigated whether a dimensional approach would better delineate...
Social Influences in Sequential Decision Making
People often make decisions in a social environment. The present work examines social influence on peoples decisions in a sequential decision-making situation. In the first experimental study, we implemented an information cascade paradigm, illustrating that people infer information from decisions of others and use this information to make their own decisions. We followed a cognitive modeling approach to elicit the weight people give to social as compared to private individual information. The proposed social influence model shows that participants overweight their own...
Anxiety-like behavioural inhibition is normative under environmental threat-reward correlations
Behavioural inhibition is a key anxiety-like behaviour in rodents and humans, distinct from avoidance of danger, and reduced by anxiolytic drugs. In some situations, it is not clear how behavioural inhibition minimises harm or maximises benefit for the agent, and can even appear counterproductive. Extant explanations of this phenomenon make use of descriptive models but do not provide a formal assessment of its adaptive value. This hampers a better understanding of the neural computations underlying anxiety behaviour. Here, we analyse a standard rodent anxiety model,...
Common attentional constraints in visual foraging
Predators are known to select food of the same type in non-random sequences or “runs” that are longer than would be expected by chance. If prey are conspicuous, predators will switch between available sources, interleaving runs of different prey types. However, when prey are cryptic, predators tend to focus on one food type at a time, effectively ignoring equally available sources. This latter finding is regarded as a key indicator that animal foraging is strongly constrained by attention. It is unknown whether human foraging is equally constrained. Here, using a novel...
Temporal event structure and timing in schizophrenia: preserved binding in a longer "now"
Patients with schizophrenia experience a loss of temporal continuity or subjective fragmentation along the temporal dimension. Here, we develop the hypothesis that impaired temporal awareness results from a perturbed structuring of events in time-i.e., canonical neural dynamics. To address this, 26 patients and their matched controls took part in two psychophysical studies using desynchronized audiovisual speech. Two tasks were used and compared: first, an identification task testing for multisensory binding impairments in which participants reported what they heard...
The dynamics of decision making in risky choice: an eye-tracking analysis
In the last years, research on risky choice has moved beyond analyzing choices only. Models have been suggested that aim to describe the underlying cognitive processes and some studies have tested process predictions of these models. Prominent approaches are evidence accumulation models such as decision field theory (DFT), simple serial heuristic models such as the adaptive toolbox, and connectionist approaches such as the parallel constraint satisfaction (PCS) model. In two studies involving measures of attention and pupil dilation, we investigate hypotheses derived...
Neural prediction errors reveal a risk-sensitive reinforcement learning process in the human brain
Humans and animals are exquisitely, though idiosyncratically, sensitive to risk or variance in the outcomes of their actions. Economic, psychological, and neural aspects of this are well studied when information about risk is provided explicitly. However, we must normally learn about outcomes from experience, through trial and error. Traditional models of such reinforcement learning focus on learning about the mean reward value of cues and ignore higher order moments such as variance. We used fMRI to test whether the neural correlates of human reinforcement learning are...
Cognitive models of risky choice: Parameter stability and predictive accuracy of prospect theory
In the behavioral sciences, a popular approach to describe and predict behavior is cognitive modeling with adjustable parameters (i.e., which can be fitted to data). Modeling with adjustable parameters allows, among other things, measuring differences between people. At the same time, parameter estimation also bears the risk of overfitting. Are individual differences as measured by model parameters stable enough to improve the ability to predict behavior as compared to modeling without adjustable parameters? We examined this issue in cumulative prospect theory (CPT),...