Prediction of activities and Events by Vision in an Urban Environment
The PREVUE (Predicting activities and Events by Vision in an Urban Environment) project plans to investigate modern Artificial Intelligence approaches for video analysis and event prediction in urban scenarios. Specifically, we claim that two significant Computer Vision topics, i.e., video-surveillance and autonomous driving are nowadays technologically mature enough to be rethought jointly in a unique framework. Simultaneously using both mobile (e.g., mounted on vehicles) and fixed (i.e. mounted in a smart city) cameras, we will analyse the urban environment (context), the behaviour of humans and moving agents (e.g.autonomous vehicles, bikes, social robots), as well as their mutual interaction.
We will explore the capabilities and we will stress the limits of algorithms and novel solutions in visual artificial intelligence to predict different types of anomalies and potentially dangerous events in the city, ranging from suspicious human behaviour detection, panic recognition in individuals and in the crowd, potential collisions, etc. The results will improve safety and efficiency in urban life. We aim at building a general framework for urban event early detection/prediction which can be used both for surveillance purposes and for providing auxiliary data to autonomous vehicles.
From the scientific point of view, we will investigate hot research topics in deep learning and computer vision. First, considering that modern deep learning techniques are heavily dependent on the availability of training data, we will collect different fully-annotated, partially-annotated and synthetic (simulated) datasets for predictive tasks. We will also cope with the need for optimizing acquisition parameters in networks of cameras in order to have the best acquisition, both for dataset collection and for real-time processing.
Moreover, we will investigate the frontier of deep learning in challenging situations, such as transfer learning and domain adaptation, semi-supervised learning and few shot learning, whose solutions have a crucial importance for the adaptation of prediction systems to new scenarios.
State-of-the-art deep networks, such as Generative Adversarial Networks, Recurrent Neural Networks and Autoregressive Autoencoders will be used together with standard Convolutional Networks in order to define an effective predictive visual intelligence for new images and temporal reasoning.
Finally, thanks also to the industrial stakeholders (that already expressed their endorsement) of the project and the support of public bodies, development and evaluation will be performed in a large scale, using massive processing resources and two different large-scale urban evaluation areas.
The results of the project will have a disruptive impact in the scientific community improving the Italian presence in international rankings, also considering datasets we will collect for real needs and in real contexts and the open-source solutions, as well a direct impact on the society of smart inclusive cities and on the Italian automotive and IT industry.
Computer Vision is a fast growing area with a huge impact to our daily lives. The scientific community is confident that tomorrow’s Computer Vision will provide artificial systems with “visual intelligence”, the human ability to reason by images in order to predict near future events and situations employing visual information.
Modern computer vision systems can successfully detect urban agents. Consider, for instance, a mother with her kids crossing the street: as drivers we know that the kid might deviate from crossing lines becoming thus the focus of our attention. Computers equipped with visual intelligence will be capable of early detecting abnormal behaviors and potentially dangerous manoeuvres, as well as predicting other agents’ interactions. They could also inform all the approaching vehicles and/or the driver and eventually actively brake. PREVUE aims at these emerging scenarios focusing on next generation of Computer Vision (CV) solutions empowered by Deep Learning (DL) applied in an urban context where people, vehicles, or moving robots move, interact and detects predictable or anomalous situations.
We aim to create new scientifically disruptive algorithms, develop software prototypes coping with predicting near-future events, actions and situations, test them in real open-world scenarios, and build the key components of new service platforms for smart cities concerning intelligent mobility.
List of the Research Units:
Università degli Studi di Modena e Reggio Emilia, Associated Investigator: Cucchiara Rita
Università degli Studi di Trento, Associated Investigator: Sebe Niculae
Università degli di Padova, Associated Investigator: Ballan Lamberto
Università degli Studi di Salerno, Associated Investigator: Nappi Michele
We recived some letters/protocols of intent from industrial stakeholders. They are:
1. Comune di Modena
2. Magneti Marelli
3. AD Consulting
4. Cluster Trasporti
5. Ferrari SPA
The project started in September 2019 and it is expected to last 36 months.
|1||Palazzi, Andrea "Città intelligenti: connettere i punti di vista visuali di guidatore, veicolo e infrastruttura." 2020 Other|
|2||Abati, Davide "Identificazione di anomalie nell’attenzione del guidatore e nel comportamento delle persone." 2020 Other|
|3||Simoni, Alessandro; Bergamini, Luca; Palazzi, Andrea; Calderara, Simone; Cucchiara, Rita "Future Urban Scenes Generation Through Vehicles Synthesis" Proceedings of the 25th International Conference on Pattern Recognition, Milan, Italy, 10-15 January 2021, 2020 Conference|
|4||Monti, Alessio; Bertugli, Alessia; Calderara, Simone; Cucchiara, Rita "DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting" Proceeding of the 25th International Conference on Pattern Recognition, Milan (Italy), 10-15 January 2021, 2020 Conference|
|5||Borghi, Guido; Pini, Stefano; Vezzani, Roberto; Cucchiara, Rita "Mercury: a vision-based framework for Driver Monitoring" Proceedings of the 3rd International Conference on Intelligent Human Systems Integration: Integrating People and Intelligent Systems (IHSI 2020), Modena, Italy, 19-21 February 2020, 2020 | DOI: 10.1007/978-3-030-39512-4_17 Conference|
|6||Pini, Stefano; Borghi, Guido; Vezzani, Roberto "Learn to See by Events: Color Frame Synthesis from Event and RGB Cameras" Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, vol. 4, Valletta (Malta), pp. 37 -47 , 27-29 February 2020, 2020 Conference|