Tesi @ ImageLab

Disponibili

Speaksee: a PyTorch library for Visual-Semantic tasks

Tipologia: Magistrale
Abstract: Speaksee è un package Python che AImageLab sta sviluppando per creare un ambiente di ricerca friendly ed efficace in ambito Visuale-Semantico. A regime, Speaksee supporterà le seguenti funzionalità:

Sviluppo di modelli di captioning stato dell'arte, sia convolutivi che ricorrenti;
Sviluppo di sistemi di retrieval visuali-semantici ad alta efficienza;
Supporto al Visual-Question answering;
Utilities per il caricamento dei dati: supporto di dataset stato dell'arte, gestione efficiente del caricamento e del pre-processing dei video;
Utilities per la valutazione delle prestazioni, il monitoraggio e il debug dei modelli sviluppati;
Model-zoo con modelli stato dell'arte di image/video captioning, retireval e visual question answering.

Il candidato interessato svilupperà funzionalità al core della libreria e le testerà con particolare riferimento alla loro efficienza ed usabilità lato utente, eventualmente producendo nuovi modelli sopra lo stato dell'arte. È richiesta un'adeguata conoscenza degli internals di PyTorch, dello stato dell'arte in ambito visuale-semantico ed è gradita la conoscenza del flow di packaging in Python e degli Unit Test.
Tirocinio esterno: No
Relatore: Prof. Rita Cucchiara
Correlatore 1: Lorenzo Baraldi

Assegnate

Concluse

Head Pose Estimation per contesti Automotive

Tipologia: Magistrale
Abstract: In questa tesi, viene proposto un sistema per la stima dell’orientazione della testa e del corpo del conducente, tramite lo sviluppo di tecniche e algoritmi propri della Visione Artificiale e del Deep Learning. La stima della posizione e dell’orientazione della testa nello spazio tridimensionale è una ricca e variegata fonte di informazioni per il campo automobilistico: è intuitivo infatti associare al monitoraggio automatico della posizione della testa del conducente un controllo attentivo, in grado di individuare momenti in cui la distrazione prende il sopravvento sulla normale attività di guida del veicolo.
Il sistema proposto risponde a specifiche esigenze caratterizzanti il fortemente dinamico, complesso e non strutturato mondo dell’automotive: precisione nella stima, per avere un sistema efficace e affidabile; prestazioni real time, per poter emettere quanto prima possibile eventuali allarmi rivolti al conducente del mezzo; invarianza a cambiamenti, anche forti, di luminosità, in modo tale da avere un sistema di monitoraggio sempre attivo, a prescindere dall’ora (giorno, notte) e dalle condizioni meteo (questo punto è stato raggiunto tramite l’utilizzo di immagini di profondità basate su sensori ad infrarossi); robustezza alle occlusioni, poiché è facile avere parti del corpo non completamente visibili a causa dei movimenti del conducente e dallo spazio limitato dell’abitacolo.
Tesista: Marco Venturelli
Tirocinio esterno: No
Relatore: Prof. Rita Cucchiara
Correlatore 1: Prof. Roberto Vezzani
Correlatore 2: Guido Borghi
Progetto: DriverAttention - Monitoring the car driver’s attention with multisensory systems, computer vision and machine learning
Attività di ricerca: Driver Attention through Head Localization and Pose Estimation

Studio e sperimentazione di sistemi di visione artificiale per il monitoraggio del conducente

Tipologia: Magistrale
Abstract: Le interfacce naturali per l’interazione tra uomo e dispositivi digitali stanno diffondendosi in ogni ambito, compreso quello dell’automotive. Comandi vocali, comportamentali e gestuali stanno piano piano sostituendo l’utilizzo di componenti fisici quali pulsanti o rotelle. L’interazione con il veicolo diventa quindi naturale e bidirezionale; il veicolo stesso acquisisce capacità di segnalazione, di predizione e di monitoraggio del mondo esterno nonché del conducente, ad esempio per controllarne il livello di attenzione. In questo ambito sono state studiate tecniche di visione artificiale e machine learning per il monitoraggio del guidatore e in particolare per la stima dell’orientazione della testa senza l’ausilio di dispositivi o marker da indossare. È stato creato un sistema basato su reti neurali convolutive, in grado di stimare l’orientazione della testa tramite il rilevamento dei landmark facciali, come la posizione di occhi, naso e bocca. Successivamente, in collaborazione con Ferrari S.p.A., è stata svolta una fase sperimentale per testare gli algoritmi proposti direttamente in autovettura, comparandone le prestazioni con sistemi commerciali.
Tesista: Elia Frigieri
Tirocinio esterno: Si
Relatore: Prof. Rita Cucchiara
Correlatore 1: Prof. Roberto Vezzani
Correlatore 2: Guido Borghi
Progetto: DriverAttention - Monitoring the car driver’s attention with multisensory systems, computer vision and machine learning
Attività di ricerca: Driver Attention through Head Localization and Pose Estimation

Image and Video Captioning with Transferred Semantic Attributes

Tipologia: Magistrale
Abstract: In questa tesi viene presentato un modello generativo basato su Reti Neurali Ricorrenti e Convolutive in grado di produrre, in modo automatizzato, la descrizione di un’immagine in linguaggio naturale. Verrà presentata inoltre una tecnica per l’estrazione dei contenuti semantici derivanti dalle immagini e il loro uso durante il ciclo generativo, col fine di produrre una descrizione il più accurata possibile. Per prima cosa si è creato un modello di base utilizzando una rete neurale convolutiva per la classificazione di immagini (vgg16) e una rete neurale ricorrente di tipo LSTM (Long-Short Term Memory), la prima per quanto riguarda l’estrazione delle informazioni *importanti* celate nell’immagine mentre la seconda relativa alla generazione in linguaggio naturale. Dopodichè si sono sperimentati diversi metodi in grado di migliorare le performance del modello.
Tesista: Gianluca Puglia
Tirocinio esterno: No
Relatore: Prof. Costantino Grana
Correlatore 1: Lorenzo Baraldi
Progetto: Città Educante
Attività di ricerca: Video Captioning

Understanding and improving face analysis in the wild: face and emotion recognition in the deep learning era

Tipologia: Magistrale
Abstract:

In the last years, deep learning has completely revolutionized the discipline of computer vision: deep learning-based approaches outperform classical techniques in nearly every field, face analysis included. Therefore this thesis project aim to study, apply and improve the last deep learning technologies regarding the face and the emotion recognition in unconstrained environments.
The work will be carried out in collaboration with Prof. Benoit Huet and PhD. Olfa Ben Ahmed of EURECOM (France). A period of two months will be spent in EURECOM offices located into the Sophia Tech Campus of Sophia Antipolis (Biot, France).

Tesista: Stefano Pini
Tirocinio esterno: No
Relatore: Prof. Rita Cucchiara
Correlatore 1: Lorenzo Baraldi
Correlatore 2: Marcella Cornia
Progetto: Città Educante
Attività di ricerca: Video Captioning with Naming

Parallelization of Connected Components Labeling Algorithms

Tipologia: Magistrale
Abstract: The Connected Components Labeling (CCL) is a well-known problem with many applications in image processing, physics, engineering and numerous other fields. The labeling process transform an image into a new symbolic one where all pixels which belong to the same connected component (object) are associated to the same identification label, hence labeling is required whenever a computer or a system needs to recognize objects (connected components) in binary images. In the past, many algorithms have been proposed because the improvement of the efficiency of CCL is critical in many applications. These algorithms differ mostly on their execution time and their performance can be easily benchmarked and compared. Unfortunately, in literature there are very few both articles and implementations which show the benefit of parallel connected components labeling algorithms. This work of thesis aim to find a viable way to parallel CCL, discovering the gain of these techniques compared to the traditional ones. The parallelization has been faced mainly with two distinguished approaches: using SIMD instructions and employing multithreading. The first strategy has been applied to the Haralick’s multiscan algorithm using the enhancement provided by AVX2 instructions. The second one, instead has been treated using Intel TBB framework applied to two scans algorithms. Furthermore, the multithread approach has been optimized to work with many other frameworks, allowing parallelization on different architectures and operating systems. The parallel improvement has been applied on distinct algorithms such as the one proposed by Wu et al. in “Two Strategies to Speed up Connected Component Labeling Algorithms” and “Optimized Block Based with Decision Trees” one published by Grana et al. . Finally, the optimized algorithms have been submitted to OpenCV —an open source computer vision and machine learning software library — in order to provide faster implementations to the worldwide computer vision community.
Tesista: Cancilla Michele
Tirocinio esterno: No
Relatore: Prof. Costantino Grana
Correlatore 1: Federico Bolelli
Attività di ricerca: Connected Components Labeling

Generative adversarial models for people attribute recognition in surveillance contexts

Tipologia: Magistrale
Abstract: Security is of fundamental importance in a world where terrorist attacks are steadily increasing. Governments and agencies face these realities every day, but not always the means at their disposal are sufficient to effectively prevent those attacks. The security area uses many science and engineering fields, and many are the areas of study available. Among the many research opportunities this work addresses the problem of the classification of attributes (such as age, sex, etc.) and items (backpacks, bags, etc.) of people through security cameras. Computer Vision based Deep Learning techniques and generative models are exploited to address this problem in an automatic fashion. The objective of the work is to explore the generalization capability of adversarial networks to enhance people image resolution, typically too low when acquired by surveillance cameras. The network is subsequently exploited to detect and classify people attributes by exploiting the power of convolutional models. Eventually, experiments demonstrate that such an approach can improve state of the art results both when the target resolution is poor and the target image is corrupted or occluded.
Tesista: Matteo Fabbri
Tirocinio esterno: No
Relatore: Prof. Rita Cucchiara
Correlatore 1: Prof. Simone Calderara

Integrazione tra pavimento sensorizzato e telecamere RGB–D per il rilevamento di persone in ambito sportivo

Tipologia: Magistrale
Abstract: Al giorno d'oggi stanno assumendo un'importanza sempre maggiore le interfacce naturali, capaci di creare efficaci e potenti interazioni uomo-macchina. In questo lavoro di tesi verranno presi in considerazione due esempi di dispositivi, i pavimenti sensorizzati e le telecamere RGB-D. Le molteplici possibilità di applicazione, dall'arredo di ambienti per scopi ludici o didattici, alle aree di riabilitazione motoria, alle applicazioni di tipo sportivo, rendono questi due dispositivi di notevole interesse sia in campo di ricerca che commerciale. L'estensivo studio sperimentale congiunto di questi due tipi di sensori è stato applicato alla ricostruzione tridimensionale di soggetti in movimento in ambito sportivo. Il risultato finale è un sistema in grado di identificare ed inseguire nel tempo gli atleti presenti sulla pavimentazione anche in presenza di affollamenti ed occlusioni, che rendono le due tecnologie insoddisfacenti qualora utilizzate singolarmente.
Tesista: Niccolò Battolla
Tirocinio esterno: No
Relatore: Prof. Rita Cucchiara
Correlatore 1: Prof. Roberto Vezzani
Progetto: Una piattaforma sensoristica avanzata per rinnovare la pratica e la fruizione dello sport, del benessere, della riabilitazione e del gioco educativo

Tecniche di Deep Q-Learning applicate al Trading Forex

Tipologia: Magistrale
Abstract: Deep Reinforcement learning has been the subject of great interest in recent years, achieving interesting results in domains such as Games and Robotics, where other methods used previously have failed. Although games are mainly responsible for the growing importance of Reinforcement learning, in this thesis we will study and apply a particular algorithm belonging to this family of algorithms, Deep Q-Learning, in order to find out potential profitable strategies for the Foreign Exchange Market.
Tesista: Riccardo Folloni
Tirocinio esterno: Si
Relatore: Prof. Simone Calderara

Action recognition applied on basketball players through Deep Learning methods

Tipologia: Magistrale
Abstract: The extraction of information and performance indices in sports is increasingly gaining importance. As sports profits grow, so do sports teams increasingly invest in collecting statistics on their athletes and those of the opposing team, as this information can be crucial both for improving their performance and for the final victory. against opposing teams. The thesis aims to create a system of recognition of actions within a basketball court; given a certain number of actions (blocked, shot, defense etc.) the system classifies for each player the type of action he is performing, all this through Deep Learning techniques.
Tesista: Francia Simone
Tirocinio esterno: No
Relatore: Prof. Simone Calderara
Correlatore 1: Fabio Lanzi
Progetto: Una piattaforma sensoristica avanzata per rinnovare la pratica e la fruizione dello sport, del benessere, della riabilitazione e del gioco educativo

Body Pose Estimation e Hand Detection per Driver Monitoring

Tipologia: Magistrale
Abstract: L’analisi dell’interazione fra uomo e ambiente è uno degli obiettivi principali della Visione Artificiale, sia nel mondo della ricerca accademica che in quello industriale. Tra i settori dove si pone maggior interesse nella ricerca di sistemi automatizzati, uno dei predominanti è quello Automotive. In un mondo che si dirige verso automobili a guida autonoma, lo studio del comportamento dei passeggeri e del conducente all’interno dell’abitacolo è un compito ambizioso, sia per quanto riguarda la sfida tecnologica che per gli aspetti sociali, morali e legali che potrebbe ricoprire. L’obiettivo di questa tesi è il Driver Monitoring, attivit`a che si occupa di valutare e monitorare le azioni del conducente del veicolo. La testa, gli occhi e le mani, frazioni dell’area superiore del busto, sono le parti del corpo che influiscono maggiormente sull’azione di guida. Lo studio della posizione delle mani, che agiscono attivamente sul controllo dell’auto, consente di ottenere informazioni sullo stato di attenzione del guidatore. L’analisi dei gesti, chiamata Gesture Analysis, si basa su una corretta rilevazione delle mani,
definita Hand Detection. Con Body Pose Estimation si intende la rilevazione di tutte le parti del corpo di un soggetto. Sfruttando queste informazioni è possibile studiare il movimento dell’intero scheletro. Le mani fanno parte dei giunti del corpo umano, tramite la body pose estimation è quindi possibile effettuare anche la hand detection. In questo progetto di tesi il lavoro svolto è incentrato su questi due task, Hand Detection e Body Pose Estimation. Vengono presentate diverse architetture basate su reti neurali per la risoluzione di entrambi i problemi in ambito Automotive. I dati utilizzati come input sono di diversa natura: immagini RGB ed immagini di profondità, chiamate Depth Maps.
Tesista: Andrea D'Eusanio
Tirocinio esterno: No
Relatore: Prof. Roberto Vezzani
Correlatore 1: Guido Borghi
Correlatore 2: Stefano Pini
Progetto: RedVision Lab

Spectral Pooling Techniques for Sequence Classification

Tipologia: Magistrale
Abstract:
Tesista: Matteo Stefanini
Tirocinio esterno: No
Relatore: Prof. Simone Calderara
Correlatore 1: Lorenzo Baraldi
Attività di ricerca: Deep Learning in videos

Visual and textual embeddings for cultural heritage applications

Tipologia: Magistrale
Abstract:
Tesista: Angelo Carraggi
Tirocinio esterno: No
Relatore: Prof. Rita Cucchiara
Correlatore 1: Lorenzo Baraldi
Correlatore 2: Marcella Cornia
Progetto: CultMEDIA

Deep learning - based framework for driver attention analysis and Human Car Interaction

Tipologia: Magistrale
Abstract: Real-time monitoring of driver attention, as well as human-vehicle interaction, are two key elements for the development of a driving support system.
In this work, we have combined different works, based on RGB images and depth images, for the development of a system prototype to monitor driver head movements. It has the goal of: (a) locating driver's head, (b) estimating the angles of the head, (c) identifying the position and the state of eyes and mouth. The system takes as input RGB and depth images acquired from kinect 2.0 device and returns the perclos measure and the driver gaze estimation as outputs. The system is completely modular so that if some part fails, the whole system suffers in limited ways. The experiments carried out in the laboratory showed that the prototype is able to estimate the driver's attention in every light condition. It can also identify precisely the objects watched by the driver.
Tesista: Diego Ballotta
Tirocinio esterno: No
Relatore: Prof. Rita Cucchiara
Correlatore 1: Prof. Roberto Vezzani
Correlatore 2: Guido Borghi
Progetto: DriverAttention - Monitoring the car driver’s attention with multisensory systems, computer vision and machine learning

Generative architectures for bridging art and real world

Tipologia: Magistrale
Abstract:
Tesista: Matteo Tomei
Tirocinio esterno: No
Relatore: Prof. Rita Cucchiara
Correlatore 1: Lorenzo Baraldi
Correlatore 2: Marcella Cornia
Progetto: CultMEDIA
Attività di ricerca: Art2Real: Translating Artworks to Photo-Realistic Images

Visual Reasoning: learning and composing primitives in an associative memory

Tipologia: Magistrale
Abstract: The ability to compose simple logics in complex reasoning plays a fundamental role in the human cognitive process. On the contrary, in modern deep neural network applications, the training process aims at maximizing the network performance in a single task. This work proposed a model, end-to-end differentiable, capable of learning and storing in memory a set of primitive functions, useful for the composition of reasoning. The presented model is evaluated on the CLEVR dataset, in which the ability of deeply understanding images content is necessary to correctly answer the related questions. The preliminary work includes the analysis of models inspired by the neural Turing machine on a synthetic dataset, created specifically to explore the problem in question. Furthermore, the possibility of using weak supervision is carefully investigated, comparing different training strategies of neural networks, including various techniques of reinforcement learning.
Tesista: Rosario Di Carlo
Tirocinio esterno: No
Relatore: Prof. Simone Calderara
Correlatore 1: Davide Abati