Exploring efficient data parallelism for genome read mapping on multicore and manycore architectures

Chen, Shaolong; Senar Rosell, Miquel

doi:10.1016/j.parco.2019.04.014

Cita bibliográfica -- Enlace permanente: https://ddd.uab.cat/record/223865

Web of Science: 2 citas, Scopus: 5 citas, Google Scholar: citas

Exploring efficient data parallelism for genome read mapping on multicore and manycore architectures
Chen, Shaolong

(Universitat Autònoma de Barcelona. Departament d'Arquitectura de Computadors i Sistemes Operatius)
Senar Rosell, Miquel

(Universitat Autònoma de Barcelona. Departament d'Arquitectura de Computadors i Sistemes Operatius)

Fecha:	2019
Resumen:	Nowadays heterogeneous architectures formed by multicore and manycore systems have become attractive solutions to cope with the data booming in genomic-based studies. Our work explores the efficient usage of heterogeneous architectures in such area. In particular, we have studied the use of manycore components like the Xeon Phi accelerator, which has proved to be a convenient choice because it allows an easy migration of applications developed for multicore servers based on the × 86 architecture. Our study also focuses on the problem of sequence alignment, which is one of the fundamental and most costly computational stages in most genome variant studies. We concentrate our attention on BWA, one of the most popular sequence aligners, and we have focused our attention on three types of heterogeneous systems, one containing Intel multi-core CPUs and accelerators, one that are made up of several multi-core servers, and one large-scale system. Each with different characteristics in terms of number of CPUs, number of cores and system organization memory. Although the problem of alignment of sequences fits in the embarrassingly parallel pattern, achieving good performance and good scalability in heterogeneous environments can be complex. We have analyzed different strategies based on the distribution of data and the replication of certain data structures and we found that MDPR (Multi-level Data Parallelization and Replication) strategy has shown the best results in all the heterogeneous platforms tested. Its results have surpassed other strategies proposed in the literature and have shown its malleability to be used in different heterogeneous environments without the need to apply specific adjustments according to the underlying architecture. In the design of MDPR, different static and dynamic data distribution strategies have also been evaluated. The best results were obtained by the static strategy, which has a significant preprocessing cost. However, the dynamic strategy of data distribution using a round-robin mechanism obtained similar times without the need for the preprocessing stage. Although our proposal was applied to BWA using human genome data samples, this strategy can be easily applied to other sequence datasets and alignment tools that have similar operating principles with those of BWA aligner.
Ayudas:	Agencia Estatal de Investigación TIN2017-84553-C2-1-R
Derechos:	Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, i la comunicació pública de l'obra, sempre que no sigui amb finalitats comercials, i sempre que es reconegui l'autoria de l'obra original. No es permet la creació d'obres derivades.
Lengua:	Anglès
Documento:	Article ; recerca ; Versió publicada
Materia:	Sequence alignment ; Heterogeneous architecture ; Intel xeon ; Intel xeon phi ; NUMA Node
Publicado en:	Parallel Computing, Vol. 87 (Sep. 2019) , p. 11-24, ISSN 0167-8191

DOI: 10.1016/j.parco.2019.04.014

14 p, 1.9 MB

El registro aparece en las colecciones:
Artículos > Artículos de investigación
Artículos > Artículos publicados

Registro creado el 2020-06-03, última modificación el 2026-01-30

Registros similares

Añadir a la cesta personal
Exportar como Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4