Scene detection in Broadcast Videos
Nowadays there is a strong interest in the re-use of video content coming from major broadcasting networks, which have been producing high quality edited videos for popular science purposes, such as documentaries and similar programs. Unfortunately, re-using videos in ones own presentations or video aided lectures is not an easy task, and requires video editing skills and tools, on top of the difficulty of finding the parts of videos which effectively contain the specific content the instructor is interested in. Story detection has been recognized as a tool which effectively may help in this situation, going beyond frames and even beyond simple editing units, such as shots. The task is to identify coherent sequences in videos, without any help from the editor or publisher. Our final goal is an improved access to broadcast video footage and a possible re-use of the huge available video content with the direct management of user-selected video-clips.
We are working towards a complete pipeline for story detection, that includes a shot detection algorithm and new approaches for grouping shots into coherent stories. We have also tackled the problem of evaluating story segmentation results, by proposing an improved performance measure, which solves frequently observed cases in which the numeric interpretation would be quite different from the expected results.
Scene-based retrieval
We also propose a retrieval pipeline for video collections, which aims to retrieve the most significant parts of an edited video for a given query, and represent them with thumbnails which are at the same time semantically meaningful and aesthetically remarkable. Videos are segmented into coherent and story-telling scenes, then a retrieval algorithm based on deep learning retrieves the most significant scenes for a textual query. A ranking strategy based on deep features is also used to tackle the problem of visualizing the best thumbnail.
Slides from the ICMR 2016 oral presentation
Demo interface
Datasets and Source Code
- Source code: Imagelab Shot Detector
- Source code: Caffe models from the ACMMM15 paper
- Dataset: RAI dataset
- Dataset: BBC Planet Earth dataset
Acknowledgments
We acknowledge the CINECA award under the ISCRA initiative, for the availability of high performance computing resources and support.
Publications
1 | Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita "Recognizing and Presenting the Storytelling Video Structure with Deep Multimodal Networks" IEEE TRANSACTIONS ON MULTIMEDIA, vol. 19, pp. 955 -968 , 2017 | DOI: 10.1109/TMM.2016.2644872 Journal |
2 | Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita "NeuralStory: an Interactive Multimedia System for Video Indexing and Re-use" Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing, Florence, Italy, 19-21 June 2017, 2017 | DOI: 10.1145/3095713.3095735 Conference |
3 | Baraldi, Lorenzo; Grana, Costantino; Messina, Alberto; Cucchiara, Rita "A Browsing and Retrieval System for Broadcast Videos using Scene Detection and Automatic Annotation" Proceedings of the 2016 ACM on Multimedia Conference, Amsterdam, The Netherlands, pp. 733 -734 , 15 - 19 October 2016, 2016 | DOI: 10.1145/2964284.2973825 Conference |
4 | Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita "Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features" Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, New York, USA, pp. 23 -29 , 6-9 Giugno 2016, 2016 | DOI: 10.1145/2911996.2912012 Conference |
5 | BARALDI, LORENZO; GRANA, Costantino; BORGHI, GUIDO; VEZZANI, Roberto; CUCCHIARA, Rita "Shot, scene and keyframe ordering for interactive video re-use" Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, vol. 4, Rome, pp. 626 -631 , Feb 27-29, 2016, 2016 | DOI: 10.5220/0005768706260631 Conference |
6 | Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita "Analysis and Re-use of Videos in Educational Digital Libraries with Automatic Scene Detection" Digital Libraries on the Move, vol. 612, Bolzano, pp. 155 -164 , Jan. 29-30, 2016 | DOI: 10.1007/978-3-319-41938-1_16 Conference |
7 | Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita "A Deep Siamese Network for Scene Detection in Broadcast Videos" Proceedings of the 23rd ACM international conference on Multimedia, Brisbane, Australia, pp. 1199 -1202 , 26-30 October 2015, 2015 | DOI: 10.1145/2733373.2806316 Conference |
8 | Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita "Shot and Scene Detection via Hierarchical Clustering for Re-using Broadcast Video" Computer Analysis of Images and Patterns. Part I, vol. 9256, Valletta, Malta, pp. 801 -811 , 2-4 September 2015, 2015 | DOI: 10.1007/978-3-319-23192-1_67 Conference |
9 | Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita "Scene segmentation using temporal clustering for accessing and re-using broadcast video" Proceedings - IEEE International Conference on Multimedia and Expo, vol. 2015-, Torino, Italia, pp. 1 -6 , 2015, 2015 | DOI: 10.1109/ICME.2015.7177476 Conference |
10 | Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita "Measuring scene detection performance" Pattern Recognition and Image Analysis, vol. 9117, Santiago de Compostela, Spain, pp. 395 -403 , 17-19 June 2015, 2015 | DOI: 10.1007/978-3-319-19390-8_45 Conference |