Web of Science: 20 citas, Scopus: 24 citas, Google Scholar: citas,
PlasForest : a homology-based random forest classifier for plasmid detection in genomic datasets
Pradier, Léa (Université de Montpellier)
Tissot, Tazzio (Universitat Autònoma de Barcelona. Departament de Genètica i de Microbiologia)
Fiston-Lavier, Anna-Sophie (Centre National de la Recherche Scientifique (França))
Bedhomme, Stéphanie (Université de Montpellier)

Fecha: 2021
Resumen: Plasmids are mobile genetic elements that often carry accessory genes, and are vectors for horizontal transfer between bacterial genomes. Plasmid detection in large genomic datasets is crucial to analyze their spread and quantify their role in bacteria adaptation and particularly in antibiotic resistance propagation. Bioinformatics methods have been developed to detect plasmids. However, they suffer from low sensitivity (i. e. , most plasmids remain undetected) or low precision (i. e. , these methods identify chromosomes as plasmids), and are overall not adapted to identify plasmids in whole genomes that are not fully assembled (contigs and scaffolds). We developed PlasForest, a homology-based random forest classifier identifying bacterial plasmid sequences in partially assembled genomes. Without knowing the taxonomical origin of the samples, PlasForest identifies contigs as plasmids or chromosomes with a F1 score of 0. 950. Notably, it can detect 77. 4% of plasmid contigs below 1 kb with 2. 8% of false positives and 99. 9% of plasmid contigs over 50 kb with 2. 2% of false positives. PlasForest outperforms other currently available tools on genomic datasets by being both sensitive and precise. The performance of PlasForest on metagenomic assemblies are currently well below those of other k-mer-based methods, and we discuss how homology-based approaches could improve plasmid detection in such datasets. The online version contains supplementary material available at 10. 1186/s12859-021-04270-w.
Ayudas: European Commission 682819
Derechos: Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, la comunicació pública de l'obra i la creació d'obres derivades, fins i tot amb finalitats comercials, sempre i quan es reconegui l'autoria de l'obra original. Creative Commons
Lengua: Anglès
Documento: Article ; recerca ; Versió publicada
Materia: Plasmid identification ; Homology ; Random forest classifier ; Genomic datasets
Publicado en: BMC bioinformatics, Vol. 22 (June 2021) , art. 349, ISSN 1471-2105

DOI: 10.1186/s12859-021-04270-w
PMID: 34174810


17 p, 2.2 MB

El registro aparece en las colecciones:
Artículos > Artículos de investigación
Artículos > Artículos publicados

 Registro creado el 2021-07-12, última modificación el 2022-06-30



   Favorit i Compartir