Unimore logo AImageLab
Back to the research area

Virtual Try-on



Dress Code

[paper] [code] [dataset]

Keywords: Dataset, GAN, Generative AI

Conference: ECCV 

Abstract. Image-based virtual try-on strives to transfer the appearance of a clothing item onto the image of a target person. Existing literature focuses mainly on upper-body clothes (e.g., t-shirts, shirts, and tops) and neglects full-body or lower-body items. This shortcoming arises from a primary factor: current publicly available datasets for image-based virtual try-ons do not account for this variety, thus limiting progress in the field. In this research activity, we introduce Dress Code, a novel dataset that contains images of multi-category clothes. Dress Code is more than 3x larger than publicly available datasets for image-based virtual try-on and features high-resolution paired images (1024x768) with front-view, full-body reference models. We propose to learn fine-grained discriminating features to generate HD try-on images with high visual quality and rich details. Specifically, we leverage a semantic-aware discriminator that makes predictions at the pixel level instead of the image patch level. The Dress Code dataset is publicly available at https://github. com/aimagelab/dress-code.

Qualitative.

?

 

LaDi-VTON

[paper] [code]

Keywords: Dataset, Latent Diffusion Models, Generative AI

Conference: ACMMM 

Abstract. The rapidly evolving fields of e-commerce and metaverse continue to seek innovative approaches to enhance the consumer experience. At the same time, recent advancements in the development of diffusion models have enabled generative networks to create remarkably realistic images. In this context, image-based virtual try-on, which consists of generating a novel image of a target model wearing a given in-shop garment, has yet to capitalize on the potential of these powerful generative solutions. This work introduces LaDIVTON, the first Latent Diffusion textual Inversion-enhanced model for the Virtual Try-ON task. The proposed architecture relies on a latent diffusion model extended with a novel additional autoencoder module that exploits learnable skip connections to enhance the generation process, preserving the model’s characteristics. To effectively maintain the texture and details of the in-shop garment, we propose a textual inversion component that can map the visual features of the garment to the CLIP token embedding space and thus generate a set of pseudo-word token embeddings capable of conditioning the generation process. Experimental results on Dress Code and VITON-HD datasets demonstrate that our approach outperforms the competitors by a consistent margin, achieving a significant milestone for the task.

Qualitative.

Research Activity Info

Virtual Try-On blank

Staff