Unimore logo AImageLab


About the Dataset

MOTSynth is a huge dataset for pedestrian detection and tracking in urban scenarios created by exploiting the highly photorealistic video game Grand Theft Auto V developed by Rockstar North. We collected a set of 768 full-HD videos, 1800 frames long, recorded at 25 fps. MOTSynth is born from the collaboration between UNIMORE and the Technical University of Munich.


  • 1,382,400 frames
  • 40,780,800 instances
  • 3D pose labels
  • Instance Segmentation labels
  • Depth labels
  • Optical Flow labels

Comparison with other datasets

Download the Dataset

Coming soon...


We believe in open research and we are happy if you find this data useful. If you use it, please cite our work.

   title     = {MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?},
   author    = {Matteo Fabbri and Guillem Bras{\'o} and Gianluca Maugeri and Aljo{\v{s}}a O{\v{s}}ep and 
                Riccardo Gasparini and Orcun Cetintas and Simone Calderara and Laura Leal-Taix{\'e} and Rita Cucchiara},
   booktitle = {International Conference on Computer Vision (ICCV)},
   year      = {2021}