Unimore logo AImageLab

Città intelligenti: connettere i punti di vista visuali di guidatore, veicolo e infrastruttura.

Abstract: The number of interconnected devices is growing rapidly around us. According to a recent Gartner report, 20.4 billion of connected “things” are expected to be in use by the end of 2020. Cities make no exception. As most of the world population is congregating in urban areas, the sector of smart mobility is growing rapidly and has become a strong driving force towards this direction. Vehicles in the first place are mutating into sophisticated data crunchers, featuring a wide range of sensors that enable increasing perception capabilities. Cameras constitute a large slice of these devices. In vehicles, inwards facing cameras allow to monitor the state of the driver and passengers, while multiple cameras pointing outwards are devoted to the understanding of the surrounding scene. At the same time, a massive number of infrastructure cameras are being installed around the cities with applications to surveillance, traffic flow monitoring, prediction plate recognition among others. In this frame, this thesis investigates how multiple visual viewpoints on the same urban scene can be put in relation to each other and how novel viewpoints can be generated. We start from the study of the driver's point of view. To this end, we collect and make publicly available a novel dataset called DR(eye)VE, composed of more than 500,000 frames of driving sequences containing drivers' gaze fixations and their temporal integration providing task-specific saliency maps. On this dataset we perform in-depth analysis of driver's attentional patterns on real-world data. Eventually, we build upon these findings to engineer and design the first deep learning based computational model of human attention during the driving task. We then research if it is possible to learn a mapping between the aforementioned first person viewpoint and other views of the scene, e.g. a bird's eye view. As collecting real-world data for this purpose would be unfeasible, we record and release a photorealistic synthetic dataset featuring 1M couples of frames, taken from both car dashboard and bird’s eye view. On these data we show that a deep convolutional network can indeed be trained to infer the bird's eye spatial occupancy of the scene starting from raw detections on the first person view. Exploring a different path towards the same goal, we introduce a two-branched convolutional encoder network based on differentiable rendering that jointly estimates the vehicle category and its 6-DoF pose in the scene. Once the category and the 6DoF pose of each vehicle is known, this information suffices to render novel viewpoints in which objects arrangement and mutual poses are preserved. Eventually, we overcome the need to decide a particular viewpoint in advance (e.g. bird's eye), presenting a framework for generating novel views of a vehicle from truly arbitrary 3D viewpoints, given a single monocular image. Differently from parametric (i.e. entirely learning-based) methods, we show how a-priori geometric knowledge about the object and the 3D world can be successfully integrated into a deep learning based image generation framework. As this geometric component is not learnt, we call our approach semi-parametric. This careful blend between parametric and non-parametric components allows us to i) operate in a real-world scenario, ii) preserve high-frequency visual information such as textures and iii) handle truly arbitrary 3D roto-translations of the input. We also show that our approach can be easily extended to other rigid objects with completely different topology, even in the presence of concave structures and holes. Comprehensive experimental analyses against state-of-the-art competitors show the efficacy of our proposals both from a quantitative and a perceptive point of view.


Citation:

Palazzi, Andrea "Città intelligenti: connettere i punti di vista visuali di guidatore, veicolo e infrastruttura." 2020

 not available

Paper download:

  • Author version:

Related projects: