english Icono del idioma   español Icono del idioma  

Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/20.500.12008/36814 Cómo citar
Título: Machine Learning methods for genome enabled prediction of complex traits : Benchmarking and robustness to marker elimination
Autor: Elenter, Juan
Etchebarne, Guillermo
Hounie, Ignacio
Fariello, María Inés
Lecumberry, Federico
Tipo: Póster
Palabras clave: Genomic prediction, Machine learning, Dimensionality reduction
Fecha de publicación: 2021
Resumen: A plethora of machine learning and statistical methods have been applied in the context of genome enabled prediction. Here we address the prediction of complex traits from SNP marker data in agriculture. The datasets used present different levels of trait complexity. These are: Yeast yield, Holstein cattle milk yield, German bulls Sire Conception Rate, and Wheat yield. Population structure, number of samples and SNPs also vary among datasets. We benchmark several popular models including bayesian and penalized linear regressions, kernel methods, and decision tree ensembles. Through exhaustive hyperparameter tuning we outperform state-of-the-art results in all datasets.Furthermore, we compare two genome codifications: One hot encoding and Additive encoding, the latter being the standard codification used in quantitative genetics. We show that, in these datasets, additive encoding outperforms categorical encodings despite the fact that the variables are categorical in nature. This difference in performance may be caused by the predominance of additive effects, the dimensionality increase and the loss of the one-to -one correspondence between variables and biological markers. Regarding robustness to random marker elimination, we found that on all datasets most models present a negligible loss in predictive power even when trained on a small, random sample of markers. We argue that sample size limits the amount of SNPs which are informative with respect to the downstream prediction task.
Descripción: Los experimentos presentados en este trabajo se realizaron utilizando ClusterUy (sitio: https://cluster.uy).
Editorial: Cold Spring Harbor Laboratory (CSHL)
EN: Probabilistic Modeling in Genomics : Virtual Meeting, 14-16 apr. 2021, Cold Spring Harbor, NY, USA.
Financiadores: Este trabajo fue parcialmente financiado por el proyecto ANII FSDA 1-2018-1-154364.
Citación: Elenter, J., Etchebarne, G., Hounie, I. y otros. Machine Learning methods for genome enabled prediction of complex traits [en línea]. Póster, 2021.
Licencia: Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)
Aparece en las colecciones: Publicaciones académicas y científicas - Instituto de Ingeniería Eléctrica

Ficheros en este ítem:
Fichero Descripción Tamaño Formato   
EEHFL21a.pdfVersión definitiva938,13 kBAdobe PDFVisualizar/Abrir


Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons Creative Commons