Sensitivity to positive and negative feedback shape our behaviour, sometimes in predictable ways. Two ways in which current outcomes shape later actions are represented by the mechanics of win-stay and lose-shift. For the student who receives the highest grade for their work, they are more likely to repeat the studying behaviours that help them arrive at that positive result. In other words, positive outcomes are more likely to lead to behavioural repetition (win-stay). For the driver reprimanded for going too fast on the highway, a speeding ticket will make it more likely that the driver will do something else other than speed next time. In other words, negative outcomes are more likely to lead to behavioural change (lose-shift).
Recently, Yechiam, Zahavi & Arditi (2015) showed that these connections between outcome and action produce shockwaves that influence behaviour well into the future. They identified win-calmness in which individuals continue to stay in a future task after winning in a current task, and loss-restlessness where individuals continue to shift in a future task after losing in a current task. We were interested in how these longer-form expressions of reinforcement learning came into being, and proposed the following. If tendencies to stay and shift carry over between tasks, then these biases must exist before the start of the future task. Furthermore, these biases must have their origins because of micro-transactions within any current task.
We used the simple game of Rock, Paper, Scissors to test these questions. This game is useful from a methodological point-of-view since it is easy to manipulate, fast and intuitive to play. Across 10 experiments testing 392 participants, we established a number of competitive contexts involving different opponent types. For example, we could guarantee that the opponent could not be beaten (unexploitable), we could programme biases into the opponent so they could be beaten (exploitable), and we analysed biases generated by the player so that participants themselves could be beaten (exploiting). In these ways, we were able to see what environments allow for the expression of win-calmness and loss-restlessness.
Expressions of win-calmness and loss-restlessness were heavily reliant on the competitive context. When participants play against unexploitable opponents, or against opponents who were exploiting the player, there was little evidence that winning or losing modulated the long-term repetition or change of behaviour. However, when participants were able to maximize their wins against exploitable opponents, not only did we see a sharp increase in behavioural repetition but that this repetition was most likely following positive outcomes (win-calmness). We also examined performance at an individual level, and showed that the higher the win rate experienced by the participant, the higher the degree win-calmness.
Our data show that the experience of winning and losing generates two very different states in the individual. Although repeating actions following a win and changing actions following a loss would appear to be complementary sides of the same coin, the flexibility of win-stay and lose-shift are quite different. The individual may feel relatively safe enjoying the exploitation of their opponent via behavioural repetition, but must engage in less-predictable, explorative behaviour when their opponent cannot be exploited, or, the individual runs the risk of exploitation themselves. These data are consistent with other findings where win-stay and lose-shift mechanisms are anatomically, evolutionary and behaviourally independent. Moreover, they point towards new and exciting avenues of research in which wider ranges of responses and opponents are experienced. In this way, laboratory data can begin to connect with the complex, continuous nature of decision-making we deploy in the real world.
To learn more, read our free and open access article: A micro-genesis account of longer-form reinforcement learning in structured and unstructured environments, published by npj Science of Learning.