Self-supervised multimodal 3-D garment reconstruction from a single consumer image for energy-efficient virtual try-on systems
Chekhmestruk, Roman (Vinnytsia National Technical University (Ucraïna))
Voitsekhovska, Olena 
(Vinnytsia National Technical University (Ucraïna))
| Data: |
2026 |
| Resum: |
Accurate 3-D reconstruction of garments from a single consumer-grade image remains a critical barrier to truly immersive and resource-aware virtual try-on systems. We introduce a self-supervised, multimodal pipeline that fuses visual tokens extracted by a Vision Transformer with textual garment descriptors to synthesise high-fidelity cloth geometry and texture while operating within the stringent power envelope of mobile neural-processing units (NPUs). A hybrid latent-diffusion module generates pseudo-meshes that supervise a lightweight INT8-quantised Mesh-Autoencoder, thereby eliminating the dependence on large annotated 3-D-scan corpora. To compensate for limited real data we construct SyntheCloth-300K, a dataset blending CLO-3D captures with PhysX-driven synthetic variations, and use it for joint visual-textual training. On the DeepFashion3D benchmark our method reduces Chamfer-Distance by 18% and improves SSIM by 0. 03 over DressCode-NeRF, while sustaining 21 FPS at 0. 32mJvertex-1 on a Snapdragon 8 Gen 3 - tripling the energy efficiency of prior art. Qualitative results reveal robust reconstruction of fine pleats and fabric drape, even under severe self-occlusion. The proposed framework thus bridges computer vision, physically based graphics, and embedded optimisation, laying the groundwork for next-generation, on-device virtual fitting applications. |
| Drets: |
Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, i la comunicació pública de l'obra, sempre que no sigui amb finalitats comercials, i sempre que es reconegui l'autoria de l'obra original. No es permet la creació d'obres derivades.  |
| Llengua: |
Anglès |
| Document: |
Article ; recerca ; Versió publicada |
| Matèria: |
Multimodal learning ;
Self-supervised diffusion ;
Garment reconstruction ;
Energy-efficient inference ;
Virtual try-on |
| Publicat a: |
ELCVIA, Vol. 25, Num. 1 (2026) , p. 60-82 (Regular Issue) , ISSN 1577-5097 |
Adreça original: https://elcvia.cvc.uab.cat/article/view/2276
Adreça alternativa: https://raco.cat/index.php/ELCVIA/article/view/980000007326
DOI: 10.5565/rev/elcvia.2276
El registre apareix a les col·leccions:
Articles >
Articles publicats >
ELCVIAArticles >
Articles de recerca
Registre creat el 2026-04-10, darrera modificació el 2026-04-19