Reading notes # 3 : continual learning

The world is complex and ever changing. Hence, we have to constantly adapt to it by learning new things in order to keep up. Whether it is new technologies, new crises, new laws, new cultures, … The world is a dynamic place where standing still actually means going backward. 

Artificial intelligence (AI) systems have the same issue. Most AI are trained to solve a task through the exposure of real life observations. But once the AI is trained the world keeps on changing and the AI needs to keep up in order to keep performing well. For example, an AI trained to detect spam in emails using data from 50 years ago, will have a hard time doing its job today.

Thus, continual learning is a trending field in AI research. Its goal is to develop protocols for training models continuously to keep up with a changing world. However, this is not as simple as feeding it new data continuously. The big problem of this field is the phenomena of catastrophic forgetting. The fact that AI systems tend to forget old knowledge while incorporating new one. So far, many attempts have been made to alleviate this issue but with little success. 

Reading notes

Through my exploration of the scientific literature, I have found a few articles on the subject of continual learning that I found interesting. Let’s go through them one by one :

TOWARDS CONTINUAL KNOWLEDGE LEARNING OF LANGUAGE MODELS

This paper discusses the need for continual learning for language models. Such models are trained to understand human language. This allows us to perform machine translation, create chatbots or study dead language. As the world changes, the words we use to describe it change as well. Hence, language models need to be constantly updated to account for new words, new expressions and changes in the meaning of words. However, there also exists time invariant knowledge that should not be forgotten when updating the model such as syntax. Consequently, the authors of this paper developed a benchmark to measure the amount of knowledge gained and forgotten. Thus, this paper is a great start for the study of continual learning in NLP.

Robust Continual Learning through a Comprehensively Progressive Bayesian Neural Network

This paper proposes a dynamic architecture approach to continual learning. The authors have developed a framework where a neural network can grow or shrink throughout the learning process. This ensures that the network efficiently uses its resources to alleviate catastrophic forgetting. Their experiments show that such dynamic methods provide better results than static architectures.

Wide Neural Networks Forget Less Catastrophically 

Architecture Matters in Continual Learning 

These two papers study the effect of various architectures and learning techniques on catastrophic forgetting in continual learning. The authors observed that the width and depth of neural networks as well the use of normalisation and pooling layers have very significant effects. Such results can be helpful for developing better models and lead the way to more investigation into the sources of catastrophic forgetting.

Is Class-Incremental Enough for Continual Learning?

This paper challenges the current trend in the continual learning literature which is to experiment mainly on class-incremental scenarios. This is where classes present in one experience are never revisited later. This means we never show old data to the model. The authors posit that an excessive focus on this setting may be limiting for future research on continual learning. Specifically, class-incremental scenarios artificially exacerbate catastrophic forgetting and do not represent real-life situations. They advocate for a more in-depth study of alternative continual learning scenarios, in which repetition of old data is integrated by design. 

How to Learn when Data Gradually Reacts to Your Model 

As we have seen, with time, models may become obsolete. However, they may also affect future data. This paper investigates situations when the data is reacting to a model deployment. The authors argue that in some cases the model will make decisions that can affect future data. For example, on YouTube, creators learn what the recommendation algorithm likes and may change their content accordingly. Consequently, the model might be trained to react accordingly to its own impact on the data. Hence, the authors of this article propose a method to improve learning in this setting. 

Judicael Poumay (Ph.D.)