Web of Science: 16 cites, Scopus: 16 cites, Google Scholar: cites,
HIV drug resistance prediction with weighted categorical kernel functions
Ramon, Elies (Centre de Recerca en Agrigenòmica)
Belanche Muñoz, Lluís A. (Lluís Antoni) (Universitat Politècnica de Catalunya. Departament de Ciències de la Computació)
Perez-Enciso, Miguel (Centre de Recerca en Agrigenòmica)

Data: 2019
Resum: Background: Antiretroviral drugs are a very effective therapy against HIV infection. However, the high mutation rate of HIV permits the emergence of variants that can be resistant to the drug treatment. Predicting drug resistance to previously unobserved variants is therefore very important for an optimum medical treatment. In this paper, we propose the use of weighted categorical kernel functions to predict drug resistance from virus sequence data. These kernel functions are very simple to implement and are able to take into account HIV data particularities, such as allele mixtures, and to weigh the different importance of each protein residue, as it is known that not all positions contribute equally to the resistance. Results: We analyzed 21 drugs of four classes: protease inhibitors (PI), integrase inhibitors (INI), nucleoside reverse transcriptase inhibitors (NRTI) and non-nucleoside reverse transcriptase inhibitors (NNRTI). We compared two categorical kernel functions, Overlap and Jaccard, against two well-known noncategorical kernel functions (Linear and RBF) and Random Forest (RF). Weighted versions of these kernels were also considered, where the weights were obtained from the RF decrease in node impurity. The Jaccard kernel was the best method, either in its weighted or unweighted form, for 20 out of the 21 drugs. Conclusions: Results show that kernels that take into account both the categorical nature of the data and the presence of mixtures consistently result in the best prediction model. The advantage of including weights depended on the protein targeted by the drug. In the case of reverse transcriptase, weights based in the relative importance of each position clearly increased the prediction performance, while the improvement in the protease was much smaller. This seems to be related to the distribution of weights, as measured by the Gini index. All methods described, together with documentation and examples, are freely available at https://bitbucket. org/elies-ramon/catkern.
Ajuts: Ministerio de Economía y Competitividad AGL2016-78709-R
Ministerio de Economía y Competitividad SEV-2015-0533
Ministerio de Economía y Competitividad BFU2016-77236-P
Drets: Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, la comunicació pública de l'obra i la creació d'obres derivades, fins i tot amb finalitats comercials, sempre i quan es reconegui l'autoria de l'obra original. Creative Commons
Llengua: Anglès
Document: Article ; recerca ; Versió publicada
Matèria: HIV ; Drug resistance prediction ; Categorical kernel ; Weighted kernel ; PI ; NRTI ; NNRTI ; INI ; Machine learning ; Support vector machine ; Random Forest ; Kernel PCA
Publicat a: BMC bioinformatics, Vol. 20 (July 2019) , art. 410, ISSN 1471-2105

DOI: 10.1186/s12859-019-2991-2
PMID: 31362714


13 p, 1.6 MB

El registre apareix a les col·leccions:
Documents de recerca > Documents dels grups de recerca de la UAB > Centres i grups de recerca (producció científica) > Ciències > CRAG (Centre de Recerca en Agrigenòmica)
Articles > Articles de recerca
Articles > Articles publicats

 Registre creat el 2020-06-03, darrera modificació el 2022-11-16



   Favorit i Compartir