Google Scholar: cites
Self-supervised multimodal 3-D garment reconstruction from a single consumer image for energy-efficient virtual try-on systems
Chekhmestruk, Roman (Vinnytsia National Technical University (Ucraïna))
Voitsekhovska, Olena (Vinnytsia National Technical University (Ucraïna))

Data: 2026
Resum: Accurate 3-D reconstruction of garments from a single consumer-grade image remains a critical barrier to truly immersive and resource-aware virtual try-on systems. We introduce a self-supervised, multimodal pipeline that fuses visual tokens extracted by a Vision Transformer with textual garment descriptors to synthesise high-fidelity cloth geometry and texture while operating within the stringent power envelope of mobile neural-processing units (NPUs). A hybrid latent-diffusion module generates pseudo-meshes that supervise a lightweight INT8-quantised Mesh-Autoencoder, thereby eliminating the dependence on large annotated 3-D-scan corpora. To compensate for limited real data we construct SyntheCloth-300K, a dataset blending CLO-3D captures with PhysX-driven synthetic variations, and use it for joint visual-textual training. On the DeepFashion3D benchmark our method reduces Chamfer-Distance by 18% and improves SSIM by 0. 03 over DressCode-NeRF, while sustaining 21 FPS at 0. 32mJvertex-1 on a Snapdragon 8 Gen 3 - tripling the energy efficiency of prior art. Qualitative results reveal robust reconstruction of fine pleats and fabric drape, even under severe self-occlusion. The proposed framework thus bridges computer vision, physically based graphics, and embedded optimisation, laying the groundwork for next-generation, on-device virtual fitting applications.
Drets: Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, i la comunicació pública de l'obra, sempre que no sigui amb finalitats comercials, i sempre que es reconegui l'autoria de l'obra original. No es permet la creació d'obres derivades. Creative Commons
Llengua: Anglès
Document: Article ; recerca ; Versió publicada
Matèria: Multimodal learning ; Self-supervised diffusion ; Garment reconstruction ; Energy-efficient inference ; Virtual try-on
Publicat a: ELCVIA, Vol. 25, Num. 1 (2026) , p. 60-82 (Regular Issue) , ISSN 1577-5097

Adreça original: https://elcvia.cvc.uab.cat/article/view/2276
Adreça alternativa: https://raco.cat/index.php/ELCVIA/article/view/980000007326
DOI: 10.5565/rev/elcvia.2276


23 p, 16.6 MB

El registre apareix a les col·leccions:
Articles > Articles publicats > ELCVIA
Articles > Articles de recerca

 Registre creat el 2026-04-10, darrera modificació el 2026-04-19



   Favorit i Compartir