A Hierarchical Quasi-Recurrent approach to Video Captioning
Abstract: Video captioning has picked up a considerable measure of attention thanks to the use of Recurrent Neural Networks, since they can be utilized to both encode the input video and to create the corresponding description. In this paper, we present a recurrent video encoding scheme which can find and exploit the layered structure of the video. Differently from the established encoder-decoder approach, in which a video is encoded continuously by a recurrent layer, we propose to employ Quasi-Recurrent Neural Networks, further extending their basic cell with a boundary detector which can recognize discontinuity points between frames or segments and likewise modify the temporal connections of the encoding layer. We assess our approach on a large scale dataset, the Montreal Video Annotation dataset. Experiments demonstrate that our approach can find suitable levels of representation of the input information, while reducing the computational requirements.
Citation:Bolelli, Federico; Baraldi, Lorenzo; Grana, Costantino "A Hierarchical Quasi-Recurrent approach to Video Captioning" 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS), Inria Sophia Antipolis, France, pp. 162 -167 , Dec 12-14, 2018 DOI: 10.1109/IPAS.2018.8708893
- Author version:
- DOI: 10.1109/IPAS.2018.8708893