Web of Science: 3 cites, Scopus: 8 cites, Google Scholar: cites,
A Critical Path File Location (CPFL) algorithm for data-aware multiworkflow scheduling on HPC clusters
Acevedo Giménez, César Esteban (Universitat Autònoma de Barcelona. Departament d'Arquitectura de Computadors i Sistemes Operatius)
Hernández Budé, Porfidio (Universitat Autònoma de Barcelona. Departament d'Arquitectura de Computadors i Sistemes Operatius)
Espinosa, Antonio (Universitat Autònoma de Barcelona. Departament d'Arquitectura de Computadors i Sistemes Operatius)
Méndez Muñoz, Víctor (Universitat Autònoma de Barcelona. Departament d'Arquitectura de Computadors i Sistemes Operatius)

Data: 2017
Resum: A representative set of workflows found in bioinformatics pipelines must deal with large data sets. Most scientific workflows are defined as Direct Acyclic Graphs (DAGs). Despite DAGs are useful to understand dependence relationships, they do not provide any information about input, output and temporal data files. This information about the location of files of data intensive applications helps to avoid performance issues. This paper presents a multiworkflow store-aware scheduler in a cluster environment called Critical Path File Location (CPFL) policy where the access time to disk is more relevant than network, as an extension of the classical list scheduling policies. Our purpose is to find the best location of data files in a hierarchical storage system. The resulting algorithm is tested in an HPC cluster and in a simulated cluster scenario with bioinformatics synthetic workflows, and largely used benchmarks like Montage and Epigenomics. The resulting simulator is tuned and validated with the first test results from the real infrastructure. The evaluation of our proposal shows promising results up to 70% on benchmarks in real HPC clusters using 128 cores and up to 69% of makespan improvement on simulated 512 cores clusters with a deviation between 0. 9% and 3% regarding the real HPC cluster.
Ajuts: European Commission 654142
Ministerio de Economía y Competitividad TIN2014-53234-C2-1-R
Drets: Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial i la comunicació pública de l'obra, sempre que no sigui amb finalitats comercials, i sempre que es reconegui l'autoria de l'obra original. No es permet la creació d'obres derivades. Creative Commons
Llengua: Anglès
Document: Article ; recerca ; Versió publicada
Matèria: Multiworkflows ; Cluster ; Scheduler ; Simulation ; Critical path ; Data processing
Publicat a: Future generation computer systems, Vol. 74 (Sep. 2017) , p. 51-62, ISSN 0167-739X

DOI: 10.1016/j.future.2017.04.025


12 p, 1.7 MB

El registre apareix a les col·leccions:
Articles > Articles de recerca
Articles > Articles publicats

 Registre creat el 2024-01-26, darrera modificació el 2024-02-27



   Favorit i Compartir