Self-supervised multimodal 3-D garment reconstruction from a single consumer image for energy-efficient virtual try-on systems

Chekhmestruk, Roman; Voitsekhovska, Olena

doi:10.5565/rev/elcvia.2276

Cita bibliogràfica -- Enllaç permanent: https://ddd.uab.cat/record/327508

Google Scholar: cites

Self-supervised multimodal 3-D garment reconstruction from a single consumer image for energy-efficient virtual try-on systems
Chekhmestruk, Roman (Vinnytsia National Technical University (Ucraïna))
Voitsekhovska, Olena

(Vinnytsia National Technical University (Ucraïna))

Data:	2026
Resum:	Accurate 3-D reconstruction of garments from a single consumer-grade image remains a critical barrier to truly immersive and resource-aware virtual try-on systems. We introduce a self-supervised, multimodal pipeline that fuses visual tokens extracted by a Vision Transformer with textual garment descriptors to synthesise high-fidelity cloth geometry and texture while operating within the stringent power envelope of mobile neural-processing units (NPUs). A hybrid latent-diffusion module generates pseudo-meshes that supervise a lightweight INT8-quantised Mesh-Autoencoder, thereby eliminating the dependence on large annotated 3-D-scan corpora. To compensate for limited real data we construct SyntheCloth-300K, a dataset blending CLO-3D captures with PhysX-driven synthetic variations, and use it for joint visual-textual training. On the DeepFashion3D benchmark our method reduces Chamfer-Distance by 18% and improves SSIM by 0. 03 over DressCode-NeRF, while sustaining 21 FPS at 0. 32mJvertex-1 on a Snapdragon 8 Gen 3 - tripling the energy efficiency of prior art. Qualitative results reveal robust reconstruction of fine pleats and fabric drape, even under severe self-occlusion. The proposed framework thus bridges computer vision, physically based graphics, and embedded optimisation, laying the groundwork for next-generation, on-device virtual fitting applications.
Drets:	Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, i la comunicació pública de l'obra, sempre que no sigui amb finalitats comercials, i sempre que es reconegui l'autoria de l'obra original. No es permet la creació d'obres derivades.
Llengua:	Anglès
Document:	Article ; recerca ; Versió publicada
Matèria:	Multimodal learning ; Self-supervised diffusion ; Garment reconstruction ; Energy-efficient inference ; Virtual try-on
Publicat a:	ELCVIA, Vol. 25, Num. 1 (2026) , p. 60-82 (Regular Issue) , ISSN 1577-5097

Adreça original: https://elcvia.cvc.uab.cat/article/view/2276
Adreça alternativa: https://raco.cat/index.php/ELCVIA/article/view/980000007326
DOI: 10.5565/rev/elcvia.2276

23 p, 16.6 MB

El registre apareix a les col·leccions:
Articles > Articles publicats > ELCVIA
Articles > Articles de recerca

Registre creat el 2026-04-10, darrera modificació el 2026-04-19

Registres semblants

Afegeix-lo al cistell personal
Anomena i desa Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4