Google Scholar: citas
Predicting robustness against transient faults of MPI based programs
Dias Lima Gramacho, João Artur (Universitat Autònoma de Barcelona. Departament d'Arquitectura de Computadors i Sistemes Operatius)
Wong, Álvaro (Universitat Autònoma de Barcelona. Departament d'Arquitectura de Computadors i Sistemes Operatius)
Rexachs del Rosario, Dolores Isabel (Universitat Autònoma de Barcelona. Departament d'Arquitectura de Computadors i Sistemes Operatius)
Luque, Emilio (Universitat Autònoma de Barcelona. Departament d'Arquitectura de Computadors i Sistemes Operatius)

Fecha: 2016
Resumen: The evaluation of a program's behaviour in the presence of transient faults is often a very time consuming work. In order to achieve significant data, thousands of executions are required and each execution will have the significant overhead of the fault injection environment. A previously published methodology reduced significantly the time needed to evaluate the robustness of a program execution by exhaustively analysing its execution trace instead of using fault injection. In this paper we present a further improvement in the evaluation time of parallel programs robustness against transient faults by combining this methodology with PAS2P - a method that strives to describe an application based on its message-passing activity. This combination allowed us to predict the robustness of larger parallel programs, reducing in some cases by more than 20 times the time needed to calculate the robustness while obtaining a robustness prediction error of less than 4%.
Ayudas: Ministerio de Ciencia e Innovación TIN2007-64974
Nota: Altres ajuts: MINETUR/TSI-020400-2010-120
Derechos: Tots els drets reservats.
Lengua: Anglès
Documento: Article ; recerca ; Versió acceptada per publicar
Materia: Transient faults ; Robustness prediction ; Soft errors ; Reliability ; Parallel application signature ; Performance prediction ; PAS2P ; MPI ; Message passing interface ; Program execution ; Parallel programs
Publicado en: International journal of computational science and engineering, Vol. 12, Issue 2/3 (2016) , p. 155-165, ISSN 1742-7185

DOI: 10.1504/IJCSE.2016.076218


Post-print
9 p, 1.1 MB

El registro aparece en las colecciones:
Documentos de investigación > Documentos de los grupos de investigación de la UAB > Centros y grupos de investigación (producción científica) > Ingeniería > HPC4EAS (High Performance Computing for Efficient Applications and Simulation Research Group)
Artículos > Artículos de investigación
Artículos > Artículos publicados

 Registro creado el 2016-07-11, última modificación el 2022-07-23



   Favorit i Compartir