A Learnheuristic Algorithm Based on Thompson Sampling for the Heterogeneous and Dynamic Team Orienteering Problem

Rodríguez Uguina, Antonio; Gómez González, Juan Francisco; Panadero, Javier; Martínez-Gavara, Anna; Juan, Ángel A

doi:10.3390/math12111758

Bibliographic citation -- Permanent link: https://ddd.uab.cat/record/304583

Google Scholar: citations

A Learnheuristic Algorithm Based on Thompson Sampling for the Heterogeneous and Dynamic Team Orienteering Problem
Rodríguez Uguina, Antonio

(Universitat Politècnica de València)
Gómez González, Juan Francisco

(Universitat Politècnica de València)
Panadero, Javier

(Universitat Autònoma de Barcelona. Departament d'Arquitectura de Computadors i Sistemes Operatius)
Martínez-Gavara, Anna

(Universitat de València)
Juan, Ángel A

(Universitat Politècnica de València)

Date:	2024
Abstract:	The team orienteering problem (TOP) is a well-studied optimization challenge in the field of Operations Research, where multiple vehicles aim to maximize the total collected rewards within a given time limit by visiting a subset of nodes in a network. With the goal of including dynamic and uncertain conditions inherent in real-world transportation scenarios, we introduce a novel dynamic variant of the TOP that considers real-time changes in environmental conditions affecting reward acquisition at each node. Specifically, we model the dynamic nature of environmental factors-such as traffic congestion, weather conditions, and battery level of each vehicle-to reflect their impact on the probability of obtaining the reward when visiting each type of node in a heterogeneous network. To address this problem, a learnheuristic optimization framework is proposed. It combines a metaheuristic algorithm with Thompson sampling to make informed decisions in dynamic environments. Furthermore, we conduct empirical experiments to assess the impact of varying reward probabilities on resource allocation and route planning within the context of this dynamic TOP, where nodes might offer a different reward behavior depending upon the environmental conditions. Our numerical results indicate that the proposed learnheuristic algorithm outperforms static approaches, achieving up to 25% better performance in highly dynamic scenarios. Our findings highlight the effectiveness of our approach in adapting to dynamic conditions and optimizing decision-making processes in transportation systems.
Grants:	Agencia Estatal de Investigación PID2022-138860NB-I00 Agencia Estatal de Investigación RED2022-134703-T European Commission 101092612 European Commission 101057294
Rights:	Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, la comunicació pública de l'obra i la creació d'obres derivades, fins i tot amb finalitats comercials, sempre i quan es reconegui l'autoria de l'obra original.
Language:	Anglès
Document:	Article ; recerca ; Versió publicada
Subject:	Combinatorial optimization ; Learnheuristics ; Reinforcement learning ; Team orienteering problem ; SDG 9 - Industry, Innovation, and Infrastructure
Published in:	Mathematics, Vol. 12, Issue 11 (June 2024) , art. 1758, ISSN 2227-7390

DOI: 10.3390/math12111758

19 p, 665.1 KB

The record appears in these collections:
Articles > Research articles
Articles > Published articles

Record created 2024-12-13, last modified 2025-03-07

Similar records

Add to personal basket
Export as Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4