Performance Optimization using Multimodal Modeling and Heterogeneous GNN

Dutta, Akash; Alcaraz, Jordi; Tehranijamsaz, Ali; César Galobardes, Eduardo; Sikora, Anna; Jannesari, Ali

doi:10.1145/3588195.3592984

Cita bibliográfica -- Enlace permanente: https://ddd.uab.cat/record/317853

Google Scholar: citas

Performance Optimization using Multimodal Modeling and Heterogeneous GNN
Dutta, Akash

(Iowa State University)
Alcaraz, Jordi

(University of Oregon)
Tehranijamsaz, Ali

(Iowa State University)
César Galobardes, Eduardo

(Universitat Autònoma de Barcelona)
Sikora, Anna

(Universitat Autònoma de Barcelona)
Jannesari, Ali

(Iowa State University)

Publicación:	New York : Association for Computing Machinery, 2023
Descripción:	13 pàg.
Resumen:	Growing heterogeneity and configurability in HPC architectures has made auto-tuning applications and runtime parameters on these systems very complex. Users are presented with a multitude of options to configure parameters. In addition to application specific solutions, a common approach is to use general purpose search strategies, which often might not identify the best configurations or their time to convergence is a significant barrier. There is, thus, a need for a general purpose and efficient tuning approach that can be easily scaled and adapted to various tuning tasks. We propose a technique for tuning parallel code regions that is general enough to be adapted to multiple tasks. In this paper, we analyze IR-based programming models to make task-specific performance optimizations. To this end, we propose the Multimodal Graph Neural Network and Autoencoder (MGA) tuner, a multimodal deep learning based approach that adapts Heterogeneous Graph Neural Networks and Denoising Autoencoders for modeling IR-based code representations that serve as separate modalities. This approach is used as part of our pipeline to model a syntax, semantics, and structure-aware IR-based code representation for tuning parallel code regions/kernels. We extensively experiment on OpenMP and OpenCL code regions/kernels obtained from PolyBench, Rodinia, STREAM, DataRaceBench, AMD SDK, NPB, NVIDIA SDK, Parboil, SHOC, LULESH, XSBench, RSBench, miniFE, miniAMR, and Quicksilver benchmarks and applications. We apply our multimodal learning techniques to the tasks of (i) optimizing the number of threads, scheduling policy and chunk size in OpenMP loops and, (ii) identifying the best device for heterogeneous device mapping of OpenCL kernels. Our experiments show that this multimodal learning based approach outperforms the state-of-the-art in almost all experiments.
Ayudas:	Agencia Estatal de Investigación PID2020-113614RB-C21 Agència de Gestió d'Ajuts Universitaris i de Recerca 2021/SGR-00574
Derechos:	Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, la comunicació pública de l'obra i la creació d'obres derivades, fins i tot amb finalitats comercials, sempre i quan es reconegui l'autoria de l'obra original.
Lengua:	Anglès
Documento:	Capítol de llibre ; recerca ; Versió publicada
Materia:	Auto-tuning ; Heterogeneous graph neural networks ; Multimodal learning ; OpenCL ; OpenMP
Publicado en:	HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, 2023, p. 45-57, ISBN 979-8-4007-0155-9

DOI: 10.1145/3588195.3592984

13 p, 1.4 MB

El registro aparece en las colecciones:
Libros y colecciones > Capítulos de libros

Registro creado el 2025-07-22, última modificación el 2025-07-28

Registros similares

Añadir a la cesta personal
Exportar como Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4