Optimizing Predicate Detection : a novel approach using ResNet152 architecture and NAdam optimizer
Ouazzani Chahdi, Meryem 
(Sidi Mohamed Ben Abdellah University (Marroc))
Ouazzani Chahdi, Adnane 
(Sidi Mohamed Ben Abdellah University (Marroc))
Annich, Afafe (Sidi Mohamed Ben Abdellah University (Marroc))
Satori, Khalid (Sidi Mohamed Ben Abdellah University (Marroc))
El Abderrahmani, Abdellatif (Sidi Mohamed Ben Abdellah University (Marroc))
| Date: |
2026 |
| Abstract: |
Visual relationship detection is crucial for semantic scene compre-hension, impacting various fields, including Human Behavior Analysis, Visual Navigation, Medicine, and Security. A key challenge in this domain is manag-ing partially visible objects and occluded object, which complicate the accurate detection of triplet relationships. This study introduces a novel approach to ad-dress these challenges by employing the ResNet152 model as the backbone for the faster R-CNN framework, paired with the Nadam optimizer - both of which have not been previously applied in this domain. Our methodology integrates region proposals and extracted features into a Graph Neural Network to con-struct scene graphs for each image. Additionally, we utilize a pre-trained Word2Vec model to encode subjects, objects, and predicate labels. To validate our approach, we conduct comparative studies evaluating the performance of ResNet152 against the widely utilized ResNet101 model and EfficientNet-B7 model, known for its efficiency in image classification, but have not yet been explored in this context. We also compare the Nadam optimizer with the com-monly adopted Adam optimizer. Our analyze focuses on the impact of these models on predicate detection and includes performance evaluations against current state-of-the-art method. Experiments on the open-access VRD dataset demonstrate that our ResNet152-Nadam combination achieves superior recall metrics, underscoring the importance of model depth and optimizer selection in enhancing predicate detection. This approach shows significant potential for advancing applications in visual relation detection, particularly in complex real-world scenario where relationships may often be unseen. |
| Rights: |
Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, i la comunicació pública de l'obra, sempre que no sigui amb finalitats comercials, i sempre que es reconegui l'autoria de l'obra original. No es permet la creació d'obres derivades.  |
| Language: |
Anglès |
| Document: |
Article ; recerca ; Versió publicada |
| Subject: |
Semantic scene understanding ;
Visual relationships detection (vrd), ;
Resnet152 backbone ;
Graph neural networks (gnns) ;
Nadam optimizer ;
Word2vec encoding |
| Published in: |
ELCVIA, Vol. 25, Num. 1 (2026) , p. 98-123 (Regular Issue) , ISSN 1577-5097 |
Adreça original: https://elcvia.cvc.uab.cat/article/view/2131
DOI: 10.5565/rev/elcvia.2131
The record appears in these collections:
Articles >
Published articles >
ELCVIAArticles >
Research articles
Record created 2026-04-27, last modified 2026-04-29