Automatic segmentation of digitalized historical manuscripts

Abstract: The artistic content of historical manuscripts provides a lot of challenges in terms of automatic text extraction, picture segmentation and retrieval by similarity. In particular this work addresses the problem of automatic extraction of meaningful pictures, distinguishing them from handwritten text and floral and abstract decorations. The proposed solution firstly employs a circular statistics description of a directional histogram in order to extract text. Then visual descriptors are computed over the pictorial regions of the page: the semantic content is distinguished from the decorative parts using color histograms and a novel texture feature called Gradient Spatial Dependency Matrix. The feature vectors are finally processed using an embedding procedure which allows increased performance in later SVM classification. Results for both feature extraction and embedding based classification are reported, supporting the effectiveness of the proposal on high resolution replicas of artistic manuscripts.


Grana, Costantino; Borghesani, Daniele; Cucchiara, Rita "Automatic segmentation of digitalized historical manuscripts" MULTIMEDIA TOOLS AND APPLICATIONS, vol. 55, pp. 483 -506 , 2011 DOI: 10.1007/s11042-010-0561-8

