Workshop Human-centered Vision: from Body Analysis to Learning and Language

Workshop organizers: Guido Borghi, Prof.ssa Rita Cucchiara, Prof. Davide Maltoni

Continual Learning

1. Real-Time Continual Learning from Natural Video Streams (Vincenzo Lomonaco e Lorenzo Pellegrini)

[Presentazione]

Abstract: Humans have the extraordinary ability to learn continually from experience. Not only we can apply previously learned knowledge and skills to new situations, we can also use these as the foundation for later learning, constantly and efficiently updating our biased understanding of the external world. On the contrary, current AI systems are usually trained offline on huge datasets and later deployed with frozen learning capabilities as they have been shown to suffer from catastrophic forgetting if trained continuously on changing data distributions. In this talk, we will present the BioLab research efforts to address these issues and support the vision of truly adaptive, pervasive and scalable AI systems. In particular, we will introduce efficient continual learning strategies that can learn real-time after deployment from high-dimensional streaming data such as natural videos directly at the edge.

2. Classificazione Rehearsal-Based in Continual Learning (Matteo Boschini, Pietro Buzzega e Simone Calderara)

[Presentazione]

Abstract: In Continual Learning, una Rete Neurale viene addestrata su uno stream di dati la cui distribuzione varia nel tempo. La problematica nota in letteratura come Catastrophic Forgetting rende difficile addestrare metodi che rimangano efficaci su esempi visti in precedenza e non solo accurati sugli ultimi. Nei nostri lavori recenti, mostriamo come una semplice baseline rehearsal-based (Experience Replay) possa essere addestrata con alcuni semplici trick per raggiungere performance paragonabili allo stato dell'arte. Inoltre, proponiamo Dark Experience Replay (DER): una nuova baseline che combina rehearsal e knowledge distillation. Questo metodo supera lo stato dell'arte corrente e mantiene caratteristiche di semplicità e limitata occupazione di risorse. Proponendo un nuovo dataset (MNIST-360), mostriamo infine anche come DER possa essere vantaggiosamente applicato in General Continual Learning: un setting sperimentale di recente formulazione volto a colmare il divario tra i recenti studi di CL e la loro applicazione in-the-wild.

Body and Face Analysis

3. People Behavior and Face Understanding (Roberto Vezzani, Stefano Pini, Matteo Fabbri e Fabio Lanzi)

[Presentazione 1]

Abstract 1: We present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images. We propose to use high-resolution volumetric heatmaps to model joint locations, devising a simple and effective compression method to drastically reduce the size of this representation. At the core of the proposed method lies our Volumetric Heatmap Autoencoder, a fully-convolutional network. Our experimental evaluation shows that our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets.

[Presentazione 2]

Abstrac 2: Recently, the advancements in development and miniaturization have supported the spread of low-cost but high-quality RGB-D sensors. Given that they can properly acquire 3D data at short-range distances, they have enabled a variety of potential applications. However, there is still a lack of public datasets and techniques that supports the development of deep learning methods able to with depth data and handle different acquisition settings. To this end, we will present several datasets that we have acquired with multiple devices for the face recognition, human pose estimation and gesture recognition. Moreover, we will show the methods and analyses that we have developed to address these tasks.

4. The challenge of Morphing for border control (Matteo Ferrara e Annalisa Franco)

[Presentazione]

Abstract: Face morphing represents nowadays a big security threat in the context of electronic identity documents as well as an interesting challenge for researchers in the field of face recognition. Despite of the good performance obtained by state-of-the-art approaches on digital images, no satisfactory solutions have been identified so far to deal with cross-database testing and printed-scanned images, typically used in many countries for document issuing. In this talk, we will analyze the current state-of-the-art methods for the Face Morphing Attack Detection task, focusing on the differential techniques. In addition, we will discuss the current open issues and the future possible application scenarios.

Language and Data Generation

5. Modelli generativi per Image Translation e Continual Learning (Gabriele Graffieti)

[Presentazione]

Abstract: In questo breve talk verranno analizzate alcune applicazioni di computer vision e continual learning utilizzando modelli generativi. Prima verranno brevemente introdotti i modelli generativi e le loro proprietà che li differenziano dai modelli discriminativi. Verranno poi presentati alcuni task tipici risolvibili utilizzando tali modelli, in particolare modo ci si focalizzerà su task di image-to-image translation, come la rimozione di nebbia da immagini (defogging) e la modifica automatica di immagini nell'ambito del face morphing e demorphing. Infine sarà presentato un ambito di utilizzo in cui un modello generativo viene utilizzato come memoria utile per ridurre il forgetting di un modello addestrato in maniera incrementale tramite la riproposizione di pattern generati simili a quelli visti in passato.

6. Vision, Language and Action: from Captioning to Embodied AI (Lorenzo Baraldi, Federico Landi e Marcella Cornia)

[Presentazione]

Abstract: A walkthrough of our research activities at the intersection of Computer Vision, Natural Language Processing and Embodied AI. We will cover architectures for connecting Vision and Language, from the traditional ones based on the recurrence paradigm, to our more recent proposals built on fully-attentive architectures. We will also describe how such architectures can be made controllable from the exterior, and how they can be extended to describe objects which were not seen in the training set. We will then discuss how these approaches can be used on embodied agents which can interact with the physical world, for navigation and for other embodied tasks.