Machine learning-based instruction helps people memorize more effectively

Scheduling reviews that optimally adapt to each learner improves retention ⎮3 min read

Like Comment
Read the paper

Remembering is one of the core objectives of learning and studying. This is true for those who are studying vocabulary for a foreign language and for those who are learning the rules of the road to pass their driving test. The benefits of the spacing effect are indisputable, i.e., instead of cramming for the exams, it is much better for long-term retention to review the material at spaced intervals. This was first pointed out in the 1890s and science has come a long way in describing the effect of passing time on memory more and more accurately.

However, the above observation does not necessarily lend itself to easy practice. There is a long way between knowing that spacing is good for retention and practically designing a pedagogical structure around it, e.g., textbooks written for students still do not usually account for spaced repetition in any way. People have been attempting to use the spacing effect in practice, starting with physical flashcards and Leitner boxes. 

Now, in the digital era, with ubiquitous computers and learning apps (e.g. Anki, Duolingo, etc.), one can keep detailed records of when learners review each learning item and how good their recall was. These apps and websites use various heuristics to optimize the teaching schedule for each learner and the heuristics often intuitively incorporate some notion of the spacing effect.

In our new research study, "Large-scale randomized experiment reveals that machine learning-based instruction helps people memorize more effectively", we formalized the problem and presented a theoretically grounded optimal solution that provides a basis for the intuition behind those heuristics.

In previous work, we have shown how we can mathematically model a simplified version of the cognitive processes behind forgetting and uncover the optimal point in time for study for each item for a learner, given his/her past reviews. However, with learning apps, learners are in control of their learning experiences. So while knowing the optimal point in time for study is useful, it is unlikely that the learner will initiate their learning session exactly when the time is optimal. Hence, we designed 'Select', an algorithm that selects the optimal questions for the users to study when they choose to initiate a session. ‘Select’ is a data-driven algorithm wherein the parameters (adjusting to correct/incorrect recall, and initial difficulty of items) are learned by looking at past tests which users have performed on the app. The algorithm itself follows the intuition of prioritizing the review of items that were closest to being forgotten. It adapts to the learners by keeping track of individualized forgetting rates per learner and per item and can be easily deployed with any learning app where the learners initiate the learning sessions.

We deployed our algorithm via a popular app that helps users prepare for the driving license exam (iTheorie Führerschein, https://swift.ch). We conducted a randomized trial experiment to compare ‘Select’ against two baseline algorithms. One of the baselines randomly selected the question to pose to the user (named ‘random’) while the other presented questions in their ascending order of previously correct answers and difficulty in a round-robin fashion (named ‘difficulty’). This experiment was run with about 50 thousand learners, which makes it one of the largest studies of its kind.

In the graph, we can see the relative decrease in the forgetting rate of items being studied by the learners as compared to when they first saw the item (Lower is better). Each triplet of bars in the figures corresponds to (learner, question) pairs in which the learner reviewed the question the same number of times (# reviews) for approximately the same period of time (T). Boxes indicate 25% and 75% quantiles and crosses indicate median values, where lower values indicate better performance. ∗ indicates a statistically significant difference (Matt-Whitney U-test, 2-sided; p-value = 0.05/36, Bonferroni correction).

We found the learners who were given questions to review, as recommended by ‘Select’, did improve their retention, all while controlling for the number of days they had used the app and the number of reviews they had done. In particular, we found that learners who were assigned questions by ‘Select’ would remember the content over ~69% longer than the `difficulty` baseline. We also found an interesting interaction between engagement and learning-algorithm selection that can be explored further: more people who were given the ML-algorithm driven reviews, stopped using the app in the initial 3 days, but those who continued using the app were more engaged when compared to learners receiving reviews using other algorithms. 

Our results have direct implications for the learning of large sets of paired-associate items by young learners using machine learning-based instruction. However, more research at the intersection of cognitive sciences and machine learning is needed to generalize our results to different populations of learners, different materials, other tasks, and on the engagement of learners. 

To learn more, read our research paper, "Large-scale randomized experiment reveals that machine learning-based instruction helps people memorize more effectively", published by npj Science of Learning.

* Our dataset and analysis are available at Networks-Learning/spaced-selection. 

Utkarsh Upadhyay

Co-founder / CTO, Reasonal