Improving slow-moving object detection in complex environments using a feature pooling enhanced encoder-decoder model : EDM-SMOD
Panigrahi, Upasana (C V Raman Global University (Índia))
Prabodh Kumar Sahoo (Parul University (Índia))
Kumar Panda, Manoj (GIET University (Índia))
Panda, Ganapati (C V Raman Global University (Índia))
| Date: |
2025 |
| Abstract: |
The ability to detect moving objects is of great importance in a wide range of visual surveillance systems, playing a vital role in maintaining security and ensuring effective monitoring. However, the primary aim of such systems is to detect objects in motion and tackle real-world challenges effectively. Despite the existence of numerous methods, there remains room for improvement, particularly in slowly moving video sequences and unfamiliar video environments. In videos where slow-moving objects are confined to a small area, it can cause many traditional methods to fail to detect the entire object. However, an effective solution is the spatial-temporal framework. Additionally, the selection of temporal, spatial, and fusion algorithms is crucial for effectively detecting slow-moving objects. This article presents a notable effort to address the detection of slowly moving objects in challenging videos by leveraging an encoder-decoder architecture incorporating a modified VGG-16 model with a feature pooling framework. Several novel aspects characterize the proposed algorithm: it utilizes a pre-trained modified VGG-16 network as the encoder, employing transfer learning to enhance model efficacy. The encoder is designed with a reduced number of layers and incorporates skip connections to extract essential fine and coarse-scale features crucial for local change detection. The feature pooling framework (FPF) utilizes a combination of different layers including max pooling, convolutional, and numerous atrous convolutional with varying rates of sampling. This integration enables the preservation of features at different scales with various dimensions, ensuring their representa tion across a wide range of scales. The decoder network comprises stacked convolutional layers effectively mapping features to image space. The performance of the developed technique is assessed in comparison to various existing methods, including those by CMRM, Hybrid algorithm, Fast valley, EPMCB, and MODCVS, showcasing its effectiveness through both subjective and objective analyses. It demonstrates superior performance, with an average F-measure (AF) value of 98. 86% and a lower average misclassification error (AMCE) value of 0. 85. Furthermore, the algorithm's effectiveness is validated on Imperceptible Video Configuration video setups, where it exhibits superior performance. |
| Rights: |
Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, i la comunicació pública de l'obra, sempre que no sigui amb finalitats comercials, i sempre que es reconegui l'autoria de l'obra original. No es permet la creació d'obres derivades.  |
| Language: |
Anglès |
| Document: |
Article ; recerca ; Versió publicada |
| Subject: |
Background subtraction ;
Deep neural network ;
Transfer learning ;
Slow moving object ;
Feature pooling framework ;
Encoder-decoder type network |
| Published in: |
ELCVIA. Electronic letters on computer vision and image analysis, Vol. 24 Núm. 2 (2025) , p. 49-69 (Regular Issue) , ISSN 1577-5097 |
Adreça original: https://elcvia.cvc.uab.cat/article/view/2023
DOI: 10.5565/rev/elcvia.2023
The record appears in these collections:
Articles >
Published articles >
ELCVIAArticles >
Research articles
Record created 2025-11-21, last modified 2025-11-23