Social learning through prediction error in the brain
From the moment we are born, we learn by observing other people. Our understanding of how our brains achieve this is much weaker than our knowledge of how self-oriented learning occurs. In a review article recently published in npj Science of Learning, a group from Yale University suggests that observational learning may in fact be quite similar to reinforcement learning – a ubiquitous form of learning in which an individual learns from the consequences of its own actions.
Is observational learning a type of reinforcement learning?
Studying reinforcement learning in animals often involves reward-based tasks. In a simple example, a mouse has the option of pressing either of two levers, only one of which results in a reward (typically food or water). At first, a reasonable mouse would predict the same outcome regardless of which lever it presses. It soon discovers, however, that this prediction is wrong – the actual outcome is different from the expected outcome – and this prediction error is used to update behaviour on a trial by trial basis.
The main question addressed in the review by Joiner and colleagues is whether social learning (specifically, learning by observing others) is a specialized form of reinforcement learning. When we observe another person, we (often subconsciously) try to predict what outcome their actions will produce. Next, we observe what happens, and based on how well that matches our prediction (i.e. the prediction error), we learn. According to Joiner and colleagues, the same reinforcement learning framework might also be used during observational learning.
To provide evidence for this, most of the review is devoted to comparing the brain circuits activated during the two forms of learning, drawing on both human and animal studies. Before this, however, the authors offer historical views of ‘representing other’ and introduce the concepts of observational learning, reinforcement learning and prediction error.
Brain circuitry underlying observational learning
The authors discuss the activity of different areas of the brain during observation of other people (or animals).
For example, the striatum is a key component of dopamine-mediated reward learning. In one study cited in the review, rats observed another rat getting a reward, leading to increased dopamine levels in the observer rat’s striatum; no increase was seen if the reward was delivered to an empty box. This supports the idea that observation can activate reward circuits, similar to what happens during reinforcement learning.
Another part of the brain, the anterior cingulate cortex (ACC), seems to be important for distinguishing self from other. This is necessary during observational learning, and Joiner and colleagues discuss evidence that different parts of the ACC become active during self-reward and reward of an observed other.
The prefrontal cortex appears to be involved in signalling error prediction, with different subregions signalling errors in predicting the actions of others, and errors predicting the outcomes of those actions. Prediction errors are a key part of reinforcement learning. The fact that neural signatures of prediction errors can be seen during observational learning suggest it does indeed share similarities to reinforcement learning.
It should be stressed that some forms of social learning may not fit in a reinforcement framework. For example, learning someone’s personality, or acquiring knowledge about how you react in a social context, probably don’t rely on reward processing. Despite this caveat, the review advances an intriguing argument that reinforcement learning – an extensively studied model that now has applications for artificial intelligence – also underlies crucial components of social learning.