Unimore logo AImageLab


About the Dataset

MOTSynth is a huge dataset for pedestrian detection and tracking in urban scenarios created by exploiting the highly photorealistic video game Grand Theft Auto V developed by Rockstar North. We collected a set of 764 full-HD videos, 1800 frames long, recorded at 20 fps. MOTSynth is born from the collaboration between UNIMORE and the Technical University of Munich.


  • 1,375,200 frames
  • 33,397,139 instances
  • 3D pose labels
  • Instance Segmentation labels
  • Depth labels

Download the Dataset

You can download the dataset here.


For funding of this project, we would like to acknowledge the InSecTT project, funded by the ECSEL Joint Undertaking (JU) under GA 876038. The JU receives support from the EU H2020 Research and Innovation programme and AU, SWE, SPA, IT, FR, POR, IRE, FIN, SLO, PO, NED, TUR. For partial funding, we acknowledge the Humboldt Foundation through the Sofja Kovalevskaja Award.


We believe in open research and we are happy if you find this data useful. If you use it, please cite our work.

   title     = {MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?},
   author    = {Matteo Fabbri and Guillem Bras{\'o} and Gianluca Maugeri and Aljo{\v{s}}a O{\v{s}}ep and 
                Riccardo Gasparini and Orcun Cetintas and Simone Calderara and Laura Leal-Taix{\'e} and Rita Cucchiara},
   booktitle = {International Conference on Computer Vision (ICCV)},
   year      = {2021}