Optimized next-generation sequencing genotype-haplotype calling for genome variability analysis

Navarro Fernández, Javier; Nevado, Bruno; Hernández Budé, Porfidio; Vera Rodríguez, Gonzalo; Ramos Onsins, Sebastián Ernesto

doi:10.1177/1176934317723884

Bibliographic citation -- Permanent link: https://ddd.uab.cat/record/186081

Google Scholar: citations

Optimized next-generation sequencing genotype-haplotype calling for genome variability analysis
Navarro Fernández, Javier (Universitat Autònoma de Barcelona. Departament d'Arquitectura de Computadors i Sistemes Operatius)
Nevado, Bruno (University of Oxford. Department of Plant Sciences)
Hernández Budé, Porfidio

(Universitat Autònoma de Barcelona. Departament d'Arquitectura de Computadors i Sistemes Operatius)
Vera Rodríguez, Gonzalo

(Centre de Recerca en Agrigenòmica)
Ramos Onsins, Sebastián Ernesto

(Centre de Recerca en Agrigenòmica)

Date:	2017
Abstract:	The accurate estimation of nucleotide variability using next-generation sequencing data is challenged by the high number of sequencing errors produced by new sequencing technologies, especially for nonmodel species, where reference sequences may not be available and the read depth may be low due to limited budgets. The most popular single-nucleotide polymorphism (SNP) callers are designed to obtain a high SNP recovery and low false discovery rate but are not designed to account appropriately the frequency of the variants. Instead, algorithms designed to account for the frequency of SNPs give precise results for estimating the levels and the patterns of variability. These algorithms are focused on the unbiased estimation of the variability and not on the high recovery of SNPs. Here, we implemented a fast and optimized parallel algorithm that includes the method developed by Roesti et al and Lynch, which estimates the genotype of each individual at each site, considering the possibility to call both bases from the genotype, a single one or none. This algorithm does not consider the reference and therefore is independent of biases related to the reference nucleotide specified. The pipeline starts from a BAM file converted to pileup or mpileup format and the software outputs a FASTA file. The new program not only reduces the running times but also, given the improved use of resources, it allows its usage with smaller computers and large parallel computers, expanding its benefits to a wider range of researchers. The output file can be analyzed using software for population genetics analysis, such as the R library PopGenome, the software VariScan, and the program mstatspop for analysis considering positions with missing data.
Grants:	Agencia Estatal de Investigación AGL2016-78709-R Ministerio de Economía y Competitividad SEV-2015-0533 Ministerio de Economía y Competitividad TIN2014-53234-C2-1-R
Note:	Altres ajuts: CERCA Programme/Generalitat de Catalunya
Rights:	Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, la comunicació pública de l'obra i la creació d'obres derivades, sempre que no sigui amb finalitats comercials, i sempre que es reconegui l'autoria de l'obra original.
Language:	Anglès
Document:	Article ; recerca ; Versió publicada
Subject:	HPC ; MPI ; Population genomics ; SNP caller ; Next-generation sequencing ; Parallelization
Published in:	Evolutionary bioinformatics online, Vol. 13 (August 2017) , p. 1-11, ISSN 1176-9343

DOI: 10.1177/1176934317723884
PMID: 28894353

11 p, 1.1 MB

The record appears in these collections:
Research literature > UAB research groups literature > Research Centres and Groups (research output) > Experimental sciences > CRAG (Centre for Research in Agricultural Genomics)
Articles > Research articles
Articles > Published articles

Record created 2018-02-07, last modified 2026-01-29

Similar records

Add to personal basket
Export as Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4