Optimizing Predicate Detection : a novel approach using ResNet152 architecture and NAdam optimizer

Ouazzani Chahdi, Meryem; Ouazzani Chahdi, Adnane; Annich, Afafe; Satori, Khalid; El Abderrahmani, Abdellatif

doi:10.5565/rev/elcvia.2131

Bibliographic citation -- Permanent link: https://ddd.uab.cat/record/328001

Google Scholar: citations

Optimizing Predicate Detection : a novel approach using ResNet152 architecture and NAdam optimizer
Ouazzani Chahdi, Meryem

(Sidi Mohamed Ben Abdellah University (Marroc))
Ouazzani Chahdi, Adnane

(Sidi Mohamed Ben Abdellah University (Marroc))
Annich, Afafe (Sidi Mohamed Ben Abdellah University (Marroc))
Satori, Khalid (Sidi Mohamed Ben Abdellah University (Marroc))
El Abderrahmani, Abdellatif (Sidi Mohamed Ben Abdellah University (Marroc))

Date:	2026
Abstract:	Visual relationship detection is crucial for semantic scene compre-hension, impacting various fields, including Human Behavior Analysis, Visual Navigation, Medicine, and Security. A key challenge in this domain is manag-ing partially visible objects and occluded object, which complicate the accurate detection of triplet relationships. This study introduces a novel approach to ad-dress these challenges by employing the ResNet152 model as the backbone for the faster R-CNN framework, paired with the Nadam optimizer - both of which have not been previously applied in this domain. Our methodology integrates region proposals and extracted features into a Graph Neural Network to con-struct scene graphs for each image. Additionally, we utilize a pre-trained Word2Vec model to encode subjects, objects, and predicate labels. To validate our approach, we conduct comparative studies evaluating the performance of ResNet152 against the widely utilized ResNet101 model and EfficientNet-B7 model, known for its efficiency in image classification, but have not yet been explored in this context. We also compare the Nadam optimizer with the com-monly adopted Adam optimizer. Our analyze focuses on the impact of these models on predicate detection and includes performance evaluations against current state-of-the-art method. Experiments on the open-access VRD dataset demonstrate that our ResNet152-Nadam combination achieves superior recall metrics, underscoring the importance of model depth and optimizer selection in enhancing predicate detection. This approach shows significant potential for advancing applications in visual relation detection, particularly in complex real-world scenario where relationships may often be unseen.
Rights:	Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, i la comunicació pública de l'obra, sempre que no sigui amb finalitats comercials, i sempre que es reconegui l'autoria de l'obra original. No es permet la creació d'obres derivades.
Language:	Anglès
Document:	Article ; recerca ; Versió publicada
Subject:	Semantic scene understanding ; Visual relationships detection (vrd), ; Resnet152 backbone ; Graph neural networks (gnns) ; Nadam optimizer ; Word2vec encoding
Published in:	ELCVIA, Vol. 25, Num. 1 (2026) , p. 98-123 (Regular Issue) , ISSN 1577-5097

Adreça original: https://elcvia.cvc.uab.cat/article/view/2131
DOI: 10.5565/rev/elcvia.2131

27 p, 4.9 MB

The record appears in these collections:
Articles > Published articles > ELCVIA
Articles > Research articles

Record created 2026-04-27, last modified 2026-04-29

Similar records

Add to personal basket
Export as Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4