Minimizing Human Labeling in Training Deep Models for Pedestrian Intention Prediction

Riaz, Muhammad Naveed; Wielgosz, Maciej; López Peña, Antonio M.

doi:10.1109/TITS.2025.3565667

Bibliographic citation -- Permanent link: https://ddd.uab.cat/record/313066

Web of Science: 1 citations, Scopus: 1 citations, Google Scholar: citations

Minimizing Human Labeling in Training Deep Models for Pedestrian Intention Prediction
Riaz, Muhammad Naveed

(Universitat Autònoma de Barcelona. Departament de Ciències de la Computació)
Wielgosz, Maciej

(Norwegian Institute of Bioeconomy)
López Peña, Antonio M.

(Centre de Visió per Computador)

Date:	2025
Abstract:	Accurately predicting whether pedestrians will cross in front of an autonomous vehicle is essential for ensuring safe and comfortable maneuvers. However, developing models for this task remains challenging due to the limited availability of diverse datasets containing both crossing (C) and non-crossing (NC) scenarios. Therefore, we propose a procedure that leverages synthetic videos with C/NC labels and an untrained model whose architecture is designed for C/NC prediction to automatically produce C/NC labels for a set of real-world videos. Thus, this procedure performs a synth-to-real unsupervised domain adaptation for C/NC prediction, so we term it S2R-UDA-CP. To assess the effectiveness of S2R-UDA-CP in self-labeling, we utilize two state-of-the-art models, PedGNN and ST-CrossingPose, and we rely on the publicly-available PedSynth dataset, which consists of synthetic videos with C/NC labels. Notably, once the real-world videos are self-labeled, they can be used to train models different from those used in S2R-UDA-CP. These models are designed to operate onboard a vehicle, whereas S2R-UDA-CP is an offline procedure. To evaluate the quality of the C/NC labels generated by S2R-UDA-CP, we also employ PedGraph+ (another literature referent) as it is not used in S2R-UDA-CP. Overall, the results show that training models to predict C/NC using videos labeled by S2R-UDA-CP achieves performance even better than models trained on human-labeled data. Our study also highlights different discrepancies between automatic and human labeling. To the best of our knowledge, this is the first study to evaluate synth-to-real self-labeling for C/NC prediction.
Grants:	Agencia Estatal de Investigación PID2020-115734RB-C21 Generalitat de Catalunya 2021/FI-SDUR-00281 Agència de Gestió d'Ajuts Universitaris i de Recerca 2021/SGR-01621 European Commission 801342
Note:	Altres ajuts: acords transformatius de la UAB
Rights:	Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, la comunicació pública de l'obra i la creació d'obres derivades, fins i tot amb finalitats comercials, sempre i quan es reconegui l'autoria de l'obra original.
Language:	Anglès
Document:	Article ; recerca ; Versió publicada
Subject:	Pedestrians ; Videos ; Predictive models ; Labeling ; Training ; Data models ; Computational modeling ; Skeleton ; Adaptation models ; Synthetic data
Published in:	IEEE transactions on intelligent transportation systems, 2025 , ISSN 1558-0016

DOI: 10.1109/TITS.2025.3565667

12 p, 4.9 MB

The record appears in these collections:
Articles > Research articles
Articles > Published articles

Record created 2025-06-25, last modified 2025-12-10

Similar records

Add to personal basket
Export as Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4