Online learning has become an integral part of modern student life with technological advancements granting student’s access to a huge amount of educational content. There have also been many recent efforts to personalize practice according to the needs and abilities of individual students. These developments provide an opportunity to reduce educational inequality, an especially relevant concern with the current importance of remote learning during COVID.
Many researchers have investigated how to design online learning systems so that they can incorporate cognitive science principles known to improve learning. Some of these critical principles include spacing practice over time (rather than cramming) and self-testing. Additionally, it has been found that imposing some difficulty can benefit learning. But how much spacing? What content should be practiced next? How difficult should it be? Answers to these questions have been vague. A common answer in research and with popular learning systems (e.g., Duolingo) has been to have students practice whatever they are about to forget. In other words, high difficulty is encouraged. But even if we ignore the motivational effects of such a strategy, is it actually the most efficient approach?
Our research study, Optimizing practice scheduling requires quantitative tracking of individual item performance, showed how optimally efficient practice could be automatically scheduled for the student using a quantitative model of learning and a difficulty threshold. We hypothesized that practicing the hardest items would not be a universally optimal strategy for one simple reason: harder items are more likely to be answered incorrectly, which is frequently more time consuming due to reviewing corrective feedback.
To determine what difficulty was optimal, we developed a quantitative model of learning to track student learning that accounted for the effects of practice and spacing. We then simulated how much students would learn after the completion of practice sessions set at many different difficulty thresholds. For instance, how much would a student learn, if we had the student practice whatever content the model predicted they had an 80% chance of correctly answering (the difficulty threshold here being 80%)?
We simulated the outcomes of this approach with thousands of simulated students learning Japanese-English vocabulary at each difficulty level between 0 and 99%. What we found was counterintuitive, and unlike previous recommendations offered by researchers and learning technology companies (e.g., Duolingo). Introducing a small amount of difficulty (e.g., 90% probability) was better than a large amount (40%). Part of the reason why this worked was that answering correctly happened much faster, so it was more productive to practice many easy trials (and work up to harder content) than focus on the harder content first (and maybe not get to the easy items). Students learning-per-second was much better overall, when practice exams had a low difficulty threshold. Finally, we tested the predictions of the simulation with real student participants that practiced exams at several difficulty thresholds (including the optimal one) and verified our prediction that lower difficulty did indeed lead to superior memory recall on a final test that students completed a few days later.
A major takeaway from our study is that considering efficiency is vital to optimally designing educational technology. Students have limited time to study, and thus the total time it costs is a vital consideration. In fact, we found that practicing efficiently had an even larger effect on memory than spacing usually does! Practicing according to a specific difficulty threshold also answers the question: “What should the student practice next?”. Our paper also provided a general roadmap for how to implement this approach in future research and educational systems. There are 3 steps: 1) collect a dataset purely for fitting the learner model (that introduces a variety of practice contexts so the model is generalizable and properly estimates spacing effects), 2) simulate practice with a variety of difficulty thresholds, and finally 3) test how well the simulation predictions bare out with an experiment.
Critically, the optimal difficulty threshold will differ across learning contexts. Sometimes, corrective feedback may be especially beneficial for mastering a topic, and thus getting it wrong may be more efficient. Reflecting on why you got something wrong matters to differing degrees, depending on the topic (not so much for vocabulary learning, but quite a bit for Anatomy and Physiology). We are currently evaluating our approach with nursing students learning Anatomy and Physiology content and have indeed found that higher difficulty can be more efficient. We hope our work offers a roadmap for how to optimally schedule practice within online educational systems.