AImageLab - Publications

Un viaggio nello lo spazio latente di modelli ad apprendimento continuo

Abstract: With advancements in technology and increasing computational power, the field of Deep Learning has experienced significant growth over the past decade. Deep Neural Networks (DNNs) have achieved remarkable performance across various tasks, setting state-of-the-art benchmarks in image, video, audio, and text domains. These models excel at learning complex patterns from data but require a substantial amount of labeled examples to reach their full potential. However, a key difference between humans and DNNs is the ability to remember. While humans can continuously learn and adapt to new tasks without forgetting past knowledge, DNNs tend to overwrite previously learned information when trained on new data. This phenomenon, known as Catastrophic Forgetting, hinders the continuous learning capability of these models. Consequently, updating a model with new information often necessitates expensive re-training on all data, which is not always feasible in real-world scenarios. The field of Continual Learning (CL) aims to address this issue by developing strategies that enable models to retain past knowledge while learning new tasks. The CL literature has expanded significantly in recent years, with various approaches involving rehearsal, regularization, distillation, and architectural modifications. Despite these advancements, the gap between continuous learning and joint training persists, and the problem of Catastrophic Forgetting remains unsolved. This thesis begins by examining the latent space - the internal data representations - of a model during the continuous learning process, and studying how it evolves over time. This analysis led to the development of several strategies to mitigate forgetting by acting on the latent space of DNNs, contributing to the CL literature. Two of these methods, named CaSpeR and CLER, introduce new regularization terms in the loss function of existing CL models. CaSpeR leverages spectral geometry techniques to constrain class representations, enhancing clustering behavior. CLER explores how invariant and equivariant self-supervised approaches impact the latent space of a model, exploiting their benefits to prevent forgetting. With the advent of Vision Transformers (ViTs), a new method named SCAD is proposed to adapt these architectures to new tasks through distillation and binary masks between the model’s internal layers. Finally, this thesis investigates the impact of pretrained Vision-Language Models (VLMs), such as CLIP, on a CL scenario. These models align the latent space of images and their corresponding captions, enabling zero-shot learning on classification tasks. This work presents two innovative methods to adapt VLMs to a CL scenario by leveraging the model’s internal representations. STAR-Prompt employs a two-level prompting approach to balance stability (the capacity to remember past knowledge) and plasticity (the ability to learn new tasks). CGIL utilizes Variational Autoencoders to perform generative replay in the embedding space of CLIP, demonstrating state-of-the-art performance on both standard CL benchmarks and new scenarios that test the model’s zero-shot capabilities.

Citation:

Frascaroli, Emanuele "Un viaggio nello lo spazio latente di modelli ad apprendimento continuo" 2025

 not available

Paper download:

Author version: