
Are Learnable Prompts the Right Way of Prompting? Adapting Vision-and-Language Models with Memory Optimization
Abstract: Few-shot learning (FSL) requires fine-tuning a pretrained model on a limited set of examples from novel classes. When applied to vision-and-language models, the dominant approach for FSL has been that of learning input prompts which can be concatenated to the input context of the model. Despite the considerable promise they hold, the effectiveness and expressive power of prompts are limited by the fact that they can only lie at the input of the architecture. In this article, we critically question the usage of learnable prompts, and instead leverage the concept of “implicit memory” to directly capture low- and high-level relationships within the attention mechanism at any layer of the architecture, thereby establishing an alternative to prompts in FSL. Our proposed approach, termed MemOp, exhibits superior performance across 11 widely recognized image classification datasets and a benchmark for contextual domain shift evaluation, effectively addressing the challenges associated with learnable prompts.
Citation:
Moratelli, Nicholas; Barraco, Manuele; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita "Are Learnable Prompts the Right Way of Prompting? Adapting Vision-and-Language Models with Memory Optimization" IEEE INTELLIGENT SYSTEMS, vol. 39, pp. 26 -34 , 2024 DOI: 10.1109/MIS.2024.3386099not available