Unimore logo AImageLab
Back to the research area

Virtual Try-On

Virtual try-on is the task of generating realistic images of a person wearing a target garment, without requiring physical trials. In our research, we focus on image-based virtual try-on methods that take as input a photo of a person and an image of a clothing item, and produce a photo-realistic output of the person wearing that item. 



LaDI-VTON

[paper] [code]

Publication: 
"LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On"
D. Morelli, A. Baldrati, G. Cartella, M. Cornia, M. Bertini, R. Cucchiara
Proceedings of the ACM International Conference on Multimedia (ACM MM), 2023

Abstract. The rapidly evolving fields of e-commerce and metaverse continue to seek innovative approaches to enhance the consumer experience. At the same time, recent advancements in the development of diffusion models have enabled generative networks to create remarkably realistic images. In this context, image-based virtual try-on, which consists of generating a novel image of a target model wearing a given in-shop garment, has yet to capitalize on the potential of these powerful generative solutions. This work introduces LaDIVTON, the first Latent Diffusion textual Inversion-enhanced model for the Virtual Try-ON task. The proposed architecture relies on a latent diffusion model extended with a novel additional autoencoder module that exploits learnable skip connections to enhance the generation process, preserving the model’s characteristics. To effectively maintain the texture and details of the in-shop garment, we propose a textual inversion component that can map the visual features of the garment to the CLIP token embedding space and thus generate a set of pseudo-word token embeddings capable of conditioning the generation process. Experimental results on Dress Code and VITON-HD datasets demonstrate that our approach outperforms the competitors by a consistent margin, achieving a significant milestone for the task.

Keywords: Dataset, Latent Diffusion Models, Generative AI

 

Dress Code

[paper] [code] [dataset]

Publication: 
"Dress Code: High-Resolution Multi-Category Virtual Try-On"
D. Morelli, M. Fincato, M. Cornia, F. Landi, F. Cesari, R. Cucchiara
Proceedings of the European Conference on Computer Vision (ECCV), 2022

Abstract. Image-based virtual try-on strives to transfer the appearance of a clothing item onto the image of a target person. Existing literature focuses mainly on upper-body clothes (e.g., t-shirts, shirts, and tops) and neglects full-body or lower-body items. This shortcoming arises from a primary factor: current publicly available datasets for image-based virtual try-ons do not account for this variety, thus limiting progress in the field. In this research activity, we introduce Dress Code, a novel dataset that contains images of multi-category clothes. Dress Code is more than 3x larger than publicly available datasets for image-based virtual try-on and features high-resolution paired images (1024x768) with front-view, full-body reference models. We propose to learn fine-grained discriminating features to generate HD try-on images with high visual quality and rich details. Specifically, we leverage a semantic-aware discriminator that makes predictions at the pixel level instead of the image patch level.

Keywords: Virtual Try-on Dataset, GAN, Generative AI

Publications

1 Morelli, Davide; Baldrati, Alberto; Cartella, Giuseppe; Cornia, Marcella; Bertini, Marco; Cucchiara, Rita "LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On" MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, Canada, pp. 8580 -8589 , October 29-November 3, 2023, 2023 | DOI: 10.1145/3581783.3612137 Conference
2 Morelli, Davide; Fincato, Matteo; Cornia, Marcella; Landi, Federico; Cesari, Fabio; Cucchiara, Rita "Dress Code: High-Resolution Multi-Category Virtual Try-On" Proceeding of the European Conference on Computer Vision (Lecture Notes in Computer Science), vol. 13668, Tel Aviv, pp. 345 -362 , October 23-27, 2022, 2022 | DOI: 10.1007/978-3-031-20074-8_20 Conference
3 Fenocchi, Emanuele; Morelli, Davide; Cornia, Marcella; Baraldi, Lorenzo; Cesari, Fabio; Cucchiara, Rita "Dual-Branch Collaborative Transformer for Virtual Try-On" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, vol. 2022-, New Orleans, Louisiana, pp. 2246 -2250 , June 19-24, 2022, 2022 | DOI: 10.1109/CVPRW56347.2022.00246 Conference
4 Morelli, Davide; Fincato, Matteo; Cornia, Marcella; Landi, Federico; Cesari, Fabio; Cucchiara, Rita "Dress Code: High-Resolution Multi-Category Virtual Try-On" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, vol. 2022-, New Orleans, Louisiana, pp. 2230 -2234 , June 19-24, 2022, 2022 | DOI: 10.1109/CVPRW56347.2022.00243 Conference
5 Fincato, Matteo; Cornia, Marcella; Landi, Federico; Cesari, Fabio; Cucchiara, Rita "Transform, Warp, and Dress: A New Transformation-Guided Model for Virtual Try-On" ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS, vol. 18, pp. 1 -23 , 2022 | DOI: 10.1145/3491226 Journal
6 Fincato, Matteo; Landi, Federico; Cornia, Marcella; Cesari, Fabio; Cucchiara, Rita "VITON-GT: An Image-based Virtual Try-On Model with Geometric Transformations" Proceedings of the 25th International Conference on Pattern Recognition, Milan, Italy, pp. 7669 -7676 , 10-15 January 2021, 2021 | DOI: 10.1109/ICPR48806.2021.9412052 Conference

Video Demo