Unimore logo AImageLab

Diffusion-generated Deepfake Detection dataset (D3)

Diffusion-generated Deepfake Detection dataset (D3)

Lorenzo Baraldi1,2, Federico Cocchi1,2, Rosario Di Carlo3, Marcella Cornia1, Lorenzo Baraldi1, Alessandro Nicolosi3, Rita Cucchiara1
1University of Modena and Reggio Emilia, 2University of Pisa, 3Leonardo S.p.A.

D3 is a multimodal dataset that contains 9.2M generated images, generated with four SoTA diffusion model generators. Each image is generated starting from a LAION-400M caption, thus referring to a realistic textual description.

A dataset for Large-scale Deepfake Detection

Existing deepfake detection datasets are limited in their diversity of generators and quantity of images. Therefore, we create and release a new dataset that can support learning deepfake detection methods from scratch. Our Diffusion-generated Deepfake Detection dataset (D3) contains nearly 2.3M records and 11.5M images. Each record in the dataset consists of a prompt, a real image, and four images generated with as many generators. Prompts and corresponding real images are taken from LAION-400M , while fake images are generated, starting from the same prompt, using different text-to-image generators.

We employ four state-of-the-art opensource diffusion models, namely Stable Diffusion 1.4 (SD-1.4), Stable Diffusion 2.1 (SD-2.1), Stable Diffusion XL (SD-XL), and DeepFloyd IF (DF-IF). While the first three generators are variants of the Stable Diffusion approach, DeepFloyd IF is strongly inspired by Imagen and thus represents a different generation technique.

With the aim of increasing the variance of the dataset, images have been generated with different aspect ratios, 2562, 5122, 640x480, and 640x360. Moreover, to mimic the distribution of real images, we also employ a variety of encoding and compression methods (BMP, GIF, JPEG, TIFF, PNG). In particular, we closely follow the distribution of encoding methods of LAION itself, therefore favoring the presence of JPEG-encoded images.

Secure and Safe AI

This work has been done under the Multimedia use case of the European network ELSA - European Lighthouse on Secure and Safe AI. The objective of the Multimedia use case is to develop effective solutions for detecting and mitigating the spread of deep fake images in multimedia content.

Machine-generated images are becoming more and more popular in the digital world, thanks to the spread of Deep Learning models that can generate visual data like Generative Adversarial Networks, and Diffusion Models. While image generation tools can be employed for lawful goals (e.g., to assist content creators, generate simulated datasets, or enable multi-modal interactive applications), there is a growing concern that they might also be used for illegal and malicious purposes, such as the forgery of natural images, the generation of images in support of fake news, misogyny or revenge porn. While the results obtained in the past few years contained artefacts which made generated images easily recognizable, today's results are way less recognizable from a pure perceptual point of view. In this context, assessing the authenticity of fake images becomes a fundamental goal for security and for guaranteeing a degree of trustworthiness of AI algorithms. There is a growing need, therefore, to develop automated methods which can assess the authenticity of images (and, in general, multimodal content), and which can follow the constant evolution of generative models, which become more realistic over time.

The Challenge on Deepfake Detection

Join our thrilling competition on deepfake detection and put your skills to the test. As the rise of deepfake technology poses unprecedented challenges, we invite individuals and teams from all backgrounds to showcase their expertise in identifying and debunking manipulated media.