Unimore logo AImageLab
Back to the project list

Interactive semantic video search with a large thesaurus of machine-learned audio-visual concepts

The VIDI-Video project aims to integrate and develop state of the art components from machine learning, audio event detection, video processing, interaction and visualization into a fully implemented audio-visual search engine combining large number of categories and exploiting the interclass similarities as well as using the information from different sources: metadata, keyword annotations, audio visual data, speech, and explicit knowledge.

white row


The scientific impact is to achieve semantic video retrieval by learning a very large thesaurus of concepts. The technological impact is to improve indexing and retrieval practices currently employed by broadcasting archivists. The societal impact is to increase the access capability to information.

Main innovation

Video is vital to society and economy. It plays a key role in the information distribution and access and it will soon be the natural form of communication for the Internet and mobile phones. Current search engines, however, all rely on keyword-based access leaving semantic access to the data to research.

VIDI-Video aims at boosting the performance of video search by forming a 1,000 detector thesaurus aiming to localize the corresponding semantic concepts in the audio, visual or combined stream of data. The approach is to let the system learn many, mostly weak, semantic detectors instead of modeling a few of them carefully. These detectors will describe different aspects of the video content. In combination they will render a rich basis for interactive access to the video library.


VIDI-Video started by integrating state of the art components from machine learning, audio detection, video processing, interaction and visualization into a system which competed successfully in TRECVID's interactive search competition (the TRECVID is a benchmark on video search organized by the American National Institute of Standards and Technology). The project aimed to improve especially on machine learning techniques, visual and audio analysis techniques and effective interaction.

Concrete outputs were a fully implemented audio-visual search engine, consisting of two main parts, viz. a learning system and a runtime system, where the former fed its results into the latter after each round of training-and-thesaurus-update. The learning system consisted of software to be developed for overall video processing; visual analysis; audio analysis; integrated feature detector; and multimedia query + user interface. All subsystems have been delivered and are available both as stand-alone and integrated into these two final, connected systems. The modularity and contemporary stand-alone status of each system warrant developmental independence, and an efficient exploitation, as commercial opportunities often target components rather than entire systems.

More details

Administrative Details

  • VIDI-VIDEO (IST-045547) is a Specific Targeted Research Project of the European Union's 6th Framework Programme - call 6.
  • The project started on 1 February 2007 and finished on 31 January 2010.
  • There are 8 partners from 6 European countries involved in the project, and the overall funding is 2.79 million euro.

List of Participants

  • Project Coordinator : Universiteit van Amsterdam, The Netherlands
  • Informatics and Telematics Institute, Greece
  • Institute for Systems and Computer Engineering, Portugal
  • University of Surrey, UK
  • Università degli Studi di Firenze, Italy
  • Universitat Autonoma de Barcelona, Spain
  • Beeld en Geluid, The Netherlands
  • Fondazione Rinascimento Digitale, Italy


1 Vezzani, Roberto; Cucchiara, Rita "Video surveillance online repository (ViSOR)" Proceedings of the 4th ACM Multimedia Systems Conference on - MMSys '13, Oslo - Norvegia, pp. 90 -95 , Feb. 27th, 2013, 2013 | DOI: 10.1145/2483977.2483987 Conference
2 Vezzani, Roberto; Cucchiara, Rita "Video Surveillance Online Repository (ViSOR): an integrated framework" MULTIMEDIA TOOLS AND APPLICATIONS, vol. 50, pp. 359 -380 , 2010 | DOI: 10.1007/s11042-009-0402-9 Journal
3 Vezzani, Roberto; Cucchiara, Rita "Annotation Collection and Online Performance Evaluation for Video Surveillance: the ViSOR Project" Proceedings of AVSS2008, vol. 1, Santa Fe, New Mexico, pp. 227 -234 , 1-3 september 2008, 2008 | DOI: 10.1109/AVSS.2008.31 Conference
4 Vezzani, Roberto; Calderara, Simone; Piccinini, Paolo; Cucchiara, Rita "Smoke detection in videosurveillance: the use of VISOR (Video Surveillance On-line Repository)" Proceedings of CIVR 2008, vol. 1, Niagara Falls, ON, usa, pp. 289 -298 , 7-9 July 2008, 2008 | DOI: 10.1145/1386352.1386392 Conference
5 Vezzani, Roberto; Cucchiara, Rita "ViSOR: Video Surveillance On-line Repository for Annotation Retrieval" Proceedings of ICME 2008, vol. 1, Hannover, deu, pp. 1281 -1284 , 23-26 june 2008, 2008 | DOI: 10.1109/ICME.2008.4607676 Conference
6 Vezzani, Roberto; Cucchiara, Rita "Visor: Video Surveillance Online Repository" Proceedings of BMVA symposium on Security and surveillance: performance evaluation, vol. 1, London, UK, pp. - -- , 13 december 2007, 2007 Conference

Project Info

VidiVideo - logo



01/02/2007 - 31/01/2010

Project Number


Funded by:

European Union

Project type:

EU - FP 6