AImageLab - Publications

Riconoscimento di azioni nei video tramite tecnologie computazionali, multimediali e di apprendimento automatico

Abstract: Video clips represent the most pervasive means of disseminating information nowadays. With their outbreak, needs for automatic categorization and content understanding have also increased, both for entertainment purposes and professional ones. In the context of multimedia and deep learning technologies for video comprehension, we explore and devise video-based algorithms and state-of-the-art solutions to tackle action recognition and fine-grained action localization. Our research is not limited to the quantitative evaluation of the proposed approaches for improving performance on specific tasks. We observe that handling video content usually brings some drawbacks. Videos often involve human actors and could arise privacy issues that are not yet sufficiently investigated by the computer vision community. Moreover, given their complexity and variability, videos are not easy to process and often require large computational resources. In addition to the application scenario, this thesis tackles two main challenges related to automatic video processing, namely privacy issues and computation. In the application part, we investigate the simultaneous detection of multiple actors and the classification of their actions, by exploiting interactions between people and surrounding objects, both in space and time. We also explore a more production-oriented application, in collaboration with Metaliquid SRL and in line with the company’s needs, by devising a deep network for salient action spotting in broadcast soccer matches. Regarding the privacy issue, we propose a novel strategy for masking people’s identities in video clips while preserving the ability of action recognition models to predict correct class labels. Finally, from the computational perspective, we develop an algorithm for reducing the size and resource utilization of existing deep neural networks, while keeping performances. These three aspects of video modeling are investigated separately but have proved to be generalizable, making it easier to build efficient and privacy-preserving action recognition models. All the alternatives and solutions presented in this work build upon deep learning, requiring a huge amount of data for learning video representations.

Citation:

Tomei, Matteo "Riconoscimento di azioni nei video tramite tecnologie computazionali, multimediali e di apprendimento automatico" 2022

 not available

Paper download:

Author version: