Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data

Sidorczuk, Katarzyna; Gagat, Przemysław; Pietluch, Filip; Kała, Jakub; Rafacz, Dominik; Bąkała, Laura; Słowik, Jadwiga; Kolenda, Rafał; Rödiger, Stefan; Fingerhut, Legana C H W; Cooke, Ira R; Mackiewicz, Paweł; Burdukiewicz, Michał

doi:10.1093/bib/bbac343

Cita bibliográfica -- Enlace permanente: https://ddd.uab.cat/record/266412

Google Scholar: citas

Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data
Sidorczuk, Katarzyna

(University of Wrocław)
Gagat, Przemysław

(University of Wrocław)
Pietluch, Filip

(University of Wrocław)
Kała, Jakub

(Warsaw University of Technology)
Rafacz, Dominik

(Warsaw University of Technology)
Bąkała, Laura

(Warsaw University of Technology)
Słowik, Jadwiga

(Warsaw University of Technology)
Kolenda, Rafał

(Wrocław University of Environmental and Life Sciences)
Rödiger, Stefan

(Brandenburg University of Technology Cottbus-Senftenberg)
Fingerhut, Legana C H W

(James Cook University. Department of Molecular and Cell Biology)
Cooke, Ira R

(James Cook University. Department of Molecular and Cell Biology)
Mackiewicz, Paweł

(University of Wrocław)
Burdukiewicz, Michał

(Universitat Autònoma de Barcelona. Institut de Biotecnologia i de Biomedicina "Vicent Villar Palasí")

Fecha:	2022
Resumen:	Antimicrobial peptides (AMPs) are a heterogeneous group of short polypeptides that target not only microorganisms but also viruses and cancer cells. Due to their lower selection for resistance compared with traditional antibiotics, AMPs have been attracting the ever-growing attention from researchers, including bioinformaticians. Machine learning represents the most cost-effective method for novel AMP discovery and consequently many computational tools for AMP prediction have been recently developed. In this article, we investigate the impact of negative data sampling on model performance and benchmarking. We generated 660 predictive models using 12 machine learning architectures, a single positive data set and 11 negative data sampling methods; the architectures and methods were defined on the basis of published AMP prediction software. Our results clearly indicate that similar training and benchmark data set, i. e. produced by the same or a similar negative data sampling method, positively affect model performance. Consequently, all the benchmark analyses that have been performed for AMP prediction models are significantly biased and, moreover, we do not know which model is the most accurate. To provide researchers with reliable information about the performance of AMP predictors, we also created a web server AMPBenchmark for fair model benchmarking. AMPBenchmark is available at http://BioGenies. info/AMPBenchmark.
Derechos:	Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, la comunicació pública de l'obra i la creació d'obres derivades, fins i tot amb finalitats comercials, sempre i quan es reconegui l'autoria de l'obra original.
Lengua:	Anglès
Documento:	Article ; recerca ; Versió publicada
Materia:	Antimicrobial peptides ; Benchmarks ; Machine learning ; Negative sampling ; Prediction ; Reproducibility
Publicado en:	Briefings in Bioinformatics, Vol. 23, Issue 5 (September 2022) , art. bbac343, ISSN 1477-4054

DOI: 10.1093/bib/bbac343
PMID: 35988923

12 p, 1.5 MB

El registro aparece en las colecciones:
Documentos de investigación > Documentos de los grupos de investigación de la UAB > Centros y grupos de investigación (producción científica) > Ciencias de la salud y biociencias > Instituto de Biotecnología y de Biomedicina (IBB)
Artículos > Artículos de investigación
Artículos > Artículos publicados

Registro creado el 2022-10-10, última modificación el 2024-09-22

Registros similares

Añadir a la cesta personal
Exportar como Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4