Machine learning for the analysis of healthy lifestyle data : a scoping review and guidelines (Preprint)

Estrella Arraez, Antonio; Capdevila Ortís, Lluís; Alfonso, Carla; Losilla Vidal, Josep Maria

Cita bibliográfica -- Enlace permanente: https://ddd.uab.cat/record/323957

Machine learning for the analysis of healthy lifestyle data : a scoping review and guidelines (Preprint)
Estrella Arraez, Antonio

(Universitat Autònoma de Barcelona. Departament de Psicologia Bàsica, Evolutiva i de l'Educació)
Capdevila Ortís, Lluís

(Universitat Autònoma de Barcelona. Departament de Psicologia Bàsica, Evolutiva i de l'Educació)
Alfonso, Carla

(Universitat Autònoma de Barcelona. Departament de Psicologia Bàsica, Evolutiva i de l'Educació)
Losilla Vidal, Josep Maria

(Universitat Autònoma de Barcelona. Institut de Recerca de l'Esport)
Universitat Autònoma de Barcelona. Institut de Recerca de l'Esport
Universitat Autònoma de Barcelona. Departament de Psicobiologia i de Metodologia de les Ciències de la Salut

Fecha:	2025
Descripción:	41 pàg.
Resumen:	Background: Advances in data science and technology have transformed lifestyle studies by enabling the integration of multimodal information and generation of large volumes of data. Despite the growing interest in machine learning (ML) in health behaviour research, significant methodological gaps remain. Objectives: The study aims to systematically review the applications of supervised ML algorithms in analyzing healthy lifestyle (HL) data, with a specific focus on the methodological approach employed. The specific objectives are to explore the types and sources of data used in health outcomes, examine the ML processes employed, including explainability artificial intelligence (XAI) methods, and review the software tools utilized. Additionally, this review aims to provide practical guidelines to enhance the quality and transparency of future ML research in health. Methods: Following the PRISMA-ScR recommendations, the search was conducted across PubMed, PsychINFO, and Web of Science, resulting in 48 studies that meet the inclusion criteria. Results: Most studies (37, 77%), integrated multidomain data from physical activity, diet, sleep, and stress. Data sources were split between self-acquired (25, 52. 08%) and health repositories (23, 47. 92%). Single items measurements were common, particularly for physical activity, diet and sleep. Despite a multimodel approach in 28 studies, random forest was the most frequently used algorithm. Only 10 studies (20. 83%) employed XAI methods, with 9 using SHapley Additive exPlanation (SHAP) values and 1 using Local Interpretable Model-agnostic Explanations (LIME). R was the most widely used software, with variations in the libraries employed. Conclusion: This review highlights methodological gaps in the application of supervised ML to HL data. The ML workflow should span from data acquisition to explainability, with iterative steps to improve the process. Multidomain approaches in data acquisition enhance understanding of health issues related to lifestyle but are constrained by low data representativeness due to methodological limitations in acquisition. While random forest was prevalent, a multimodel approach is recommended for comprehensive comparison. Lifestyle components consistently ranked among the top features in studies that incorporated XAI. Integrating XAI methods into the ML pipeline can support personalized interventions, provided the data is accurately collected. The R metapackage tidymodels facilitates process evaluation through unified syntax, improving replicability. Methodological and reporting guidelines are provided to enhance transparency and replicability in multidisciplinary ML research.
Ayudas:	Agencia Estatal de Investigación PID2019-107473RB-C21 Agencia Estatal de Investigación PID2022-141403NB-I00 Generalitat de Catalunya 2021/SGR-00806
Derechos:	Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, la comunicació pública de l'obra i la creació d'obres derivades, fins i tot amb finalitats comercials, sempre i quan es reconegui l'autoria de l'obra original.
Lengua:	Anglès
Documento:	Prepublicació ; recerca ; Versió sotmesa a revisió
Materia:	Machine learning ; Artificial intelligence ; Healthy lifestyle ; Physical activity ; Diet ; Sleep ; Stress ; Review ; Data analysis ; XAI
Publicado en:	JMIR Human Factors, 2025, p. 1-65, ISSN 2292-9495

Adreça original: https://preprints.jmir.org/preprint/78648

Preprint
41 p, 2.0 MB

El registro aparece en las colecciones:
Documentos de investigación > Prepublicacions

Registro creado el 2026-02-18, última modificación el 2026-03-29

Registros similares

Añadir a la cesta personal
Exportar como Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4