a MUltimedia platform for Content Enrichment and Search in audiovisual archives
Audiovisual documents are a vital resource for future generations to preserve and recollect their past cultures, beliefs, and customs.
A famous example is the LUCE Institute Archive, which provided an invaluable historical and cultural cross-section of the first half of
the XX century by preserving more than 77.000 online-accessible digitized films. Given the Cambrian explosion in the production of
audiovisual documents witnessed by the last century, cultural heritage preservation faces new challenges in managing ever-larger
digitized audiovisual documents and keeping them accessible. The RAI Italian TV digital archive contains more than 1.3M hours of
recorded TV and radio programs and 800.000 movies dating back to 1954 that still need to be completely cataloged. Overall, the
rate of production of audiovisuals exceeds by far the resources required to build and maintain accessible archives.
In this context, Artificial Intelligence models can help to increase the accessibility of audiovisual archives by automatically understanding their content, extracting information, and indexing them to be easily searchable. Existing methods can analyze visual
content and retrieve knowledge based on user-defined queries but are limited to analyzing static images and recognizing and
describing generic content belonging to the English/American culture. Clearly, this prevents their applicability to audiovisual and
historical archives.
The MUCES project will make a radical change by investigating and developing innovative Deep Learning models to make unlabeled
audiovisual archives of the Italian cultural patrimony searchable by natural language and exemplar queries in a personalized
manner. In particular, the project will develop, train and publicly release models which are:
- Fully multi-modal and natively designed to work on videos and to exploit its inherent multi-modal nature by jointly considering
motion, appearance, and audio;
- Personalizable and adaptable to long-tail concepts with scarce annotation, making them suitable to deal with concepts from the
Italian culture and specific to the cultural heritage domain;
- Deployable in large-scale scenarios and designed to work efficiently on huge archives containing millions of videos.
At the core of the project lies a new unifying synergy between cutting-edge research in Computer Vision, Machine Learning, and
large-scale Content-Based Retrieval. The project brings together the research experiences and expertise of two
internationally-recognized research teams: the AImageLab research group at UNIMORE and the Artificial Intelligence for Media and
Humanities laboratory at ISTI CNR, encompassing years of expertise in Multimedia, Similarity Search, and Computer Vision. The
project proposes foundational research with direct practical and industrial exploitation. We foresee a significant benefit for society as
well as in paving the way to new research directions in several areas of AI.