![AImageLab](https://aimagelab.ing.unimore.it/imagelab/images/header_aimagelab.png)
Ludovico Antonio Muratori (LAM) Dataset
S. Cascianelli, V. Pippi, M. Maarand, M. Cornia, L. Baraldi, C. Kermorvant, R. Cucchiara
The Ludovico Antonio Muratori (LAM) dataset is the largest line-level HTR dataset to date and contains 25,823 lines from Italian ancient manuscripts edited by a single author over 60 years. The dataset comes in two configurations: a basic splitting and a date-based splitting which takes into account the age of the author. The first setting is intended to study HTR on ancient documents in Italian, while the second focuses on the ability of HTR systems to recognize text written by the same writer in time periods for which training data are not available.
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000961.png)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000944.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000945.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000946.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000947.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000948.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000949.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000944.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000945.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000946.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000947.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000948.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000949.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000950.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000951.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000952.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000953.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000954.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000955.jpg)
Please cite with the following BibTeX:
@inproceedings{cascianelli2022lam, title={The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition}, author={Cascianelli, Silvia and Pippi, Vittorio and Martin, Maarand and Cornia, Marcella and Baraldi, Lorenzo and Christopher, Kermorvant and Cucchiara, Rita}, booktitle={International Conference on Pattern Recognition}, year={2022} }
Dataset info
To make LAM easily accessible to all and to facilitate its use, we present some features that could be useful during the development using this dataset.
The images below show how the samples are and how the annotations are made.
Samples
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000883.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000884.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000885.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000886.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000887.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000901.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000902.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000889.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000890.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000891.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000892.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000893.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000894.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000895.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000896.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000888.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000897.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000898.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000899.jpg)
![](https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000900.jpg)
{
"decade_id": ,
"img": ".jpg",
"nameset": "",
"text": "",
"width": ,
"height":
}
Images info
To facilitate the network design process we provide the width and height distribution of all samples in the dataset. Moreover, the following table shows which are the minimum and maximum sizes of all images.
Min | Max | Avg | |
---|---|---|---|
Width | 29 px | 1700 px | 658 px |
Height | 14 px | 235 px | 53 px |
Dataset comparison
Designing and developing effective HTR solutions requires the availability of large data collections, which should capture both the visual variability of the task and represent different languages. In the following, we compare LAM with other line-level datasets of western-characters, since these are more closely related to our proposed dataset.
Text info
LAM contains a label length distribution very similar to IAM. The graphs show that there isn't an evident difference between the different sets of distributions.
Min | Max | Avg |
---|
Decade info
The dataset is divided into 6 decades distributed along with the Ludovico Antonio Muratori's life. Each sample has the field decade_id that indicates the decade index in the following table:
Index | Date range | Samples |
---|
Splits
Split | Train | Validation | Test | Total |
---|---|---|---|---|
Basic | 19830 (77%) | 2470 (10%) | 3523 (13%) | 25823 |
Leave decade 1 out | 17205 (68%) | 1911 (8%) | 6067 (24%) | 25183 |
Leave decade 2 out | 17205 (77%) | 1911 (9%) | 3276 (15%) | 22392 |
Leave decade 3 out | 17205 (82%) | 1911 (9%) | 1950 (9%) | 21066 |
Leave decade 4 out | 17205 (68%) | 1911 (8%) | 6042 (24%) | 25158 |
Leave decade 5 out | 17205 (75%) | 1911 (8%) | 3858 (17%) | 22974 |
Leave decade 6 out | 17205 (74%) | 1911 (8%) | 3990 (17%) | 23106 |
Contacts
If you have any general doubt about our dataset, please use the public issues section on this github repo. Alternatively, drop us an e-mail at vittorio.pippi [at] unimore.it or silvia.cascianelli [at] unimore.it.Acknowledgment
This work was supported by the "AI for Digital Humanities" project, funded by "Fondazione di Modena", and by the "DHMoRe Lab" project, funded by "Regione Emilia Romagna". The authors thank Dr. Maria Ludovica Piazzi, Dr. Rosiana Schiuma, and the Estense Digital Library for the contribution and support provided in preparing the dataset.![](https://www.fondazionedimodena.it/wp-content/uploads/2019/06/logo@2x.png)
![](https://seeklogo.com/images/R/regione-emilia-romagna-logo-FAA2DE0074-seeklogo.com.png)