Sfruttare i Dati Sintetici per Migliorare la Comprensione del Comportamento Umano
Abstract: Most recent Deep Learning techniques require large volumes of training data in order to achieve human-like performance. Especially in Computer Vision, datasets are expensive to create because they usually require a considerable manual effort that can not be automated. Indeed, manual annotation is error-prone, inconsistent for subjective tasks (e.g. age classification), and not applicable to particular data (e.g. high frame-rate videos). For some tasks, like pose estimation and tracking, an alternative to manual annotation implies the use of wearable sensors. However, this approach is not feasible under some circumstances (e.g. in crowded scenarios) since the need to wear sensors limits its application to controlled environments. To overcome all the aforementioned limitations, we collected a set of synthetic datasets exploiting a photorealistic videogame. By relying on a virtual simulator, the annotations are error-free and always consistent as there is no manual annotation involved. Moreover, our data is suitable for in-the-wild applications as it contains multiple scenarios and a high variety of people appearances. In addition, our datasets are privacy compliant as no real human was involved in the data acquisition. Leveraging this newly collected data, extensive studies have been conducted on a plethora of tasks. In particular, for 2D pose estimation and tracking, we propose a deep network architecture that jointly extracts people body parts and associates them across short temporal spans. Our model explicitly deals with occluded body parts, by hallucinating plausible solutions of not visible joints. For 3D pose estimation, we propose to use high-resolution volumetric heatmaps to model joint locations, devising a simple and effective compression method to drastically reduce the size of this representation. For attribute classification, we overcome a common problem in surveillance, namely people occlusion, by designing a network capable of hallucinating occluded people with a plausible aspect. From a more practical point of view, we design an edge-AI system capable of evaluating in real-time the COVID-19 contagion risk of a monitored area by analyzing video streams. As synthetic data might suffer domain-shift related problems, we further investigate image translation techniques for the tasks of head pose estimation, attribute recognition and face landmark localization.
Citation:Fabbri, Matteo "Sfruttare i Dati Sintetici per Migliorare la Comprensione del Comportamento Umano" 2021