english Icono del idioma   español Icono del idioma  

Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/20.500.12008/28582 Cómo citar
Título: DNAI : Machine learning for genome enabled prediction of complex traits in agriculture
Autor: Elenter, Juan
Etchebarne, Guillermo
Hounie, Ignacio
Título Obtenido: Ingeniero Electricista
Facultad o Servicio que otorga el Título: Universidad de la República (Uruguay). Facultad de Ingeniería.
Tutor: Fariello, María Inés
Lecumberry, Federico
Tipo: Tesis de grado
Palabras clave: Aprendizaje profundo, Predicción genómica, Redes neuronales, Grafos
Fecha de publicación: 2021
Resumen: Genome enabled prediction of complex traits aims to predict a measurable characteristic of an organism using their genetic information. In the present work we address diverse traits and organisms including yeast growth, wheat yield, Jersey bull fertility and Holstein cattle milk yield. We benchmark several popular Machine Learning models: bayesian and penalized linear regressions, kernel methods, and decision tree ensembles. Through exhaustive hyperparameter tuning we outperform state-of-the-art results in most datasets. We also compare two codification techniques for input data and perform ablation studies to assess robustness to genetic marker - i.e input features - elimination. We then explore different Deep Learning architectures for this task. We propose and evaluate CNN architectures, showing that using residual connections improves perfomance but that in some cases Fully Connected Networks outperform CNNs. We link this to the fact that absolute positions are relevant in genomes, and thus, CNN's translational equivariance may not be an adequate inductive bias for tackling this problem. In addition, we explore using PCA and TSNE for mapping input features to two-dimensional image-like feature maps used as inputs to 2D-CNN architectures. We assess the effectiveness of the aforementioned dimensionality reduction techniques when used to construct those mappings, and find that in some cases, using random mappings performs comparably. We also propose a method to construct these image-like feature maps based on an approximation to the Fermat distance. Furthermore, we evaluate graph neural network architectures by formulating trait prediction as a node regression problem on a population graph, where each node represents an individual, and edges association between their genetic information. We evaluate the transferability of these graphical models and find that the extent to which they exploit neighbourhood information is limited. We also propose a model combining CNN and GNN architectures, which outperforms all other models in Holstein cattle milk yield prediction. Lastly, we propose optimising Pearson correlation directly, which is commonly used to evaluate model performance, but MSE is usually minimised. Although this loss does not penalise learning an affine transformation of actual phenotypes, we show that this affine transformation can be estimated from train data, and leads to models with both lower MSE and higher predictive correlations.
Editorial: Udelar.FI.
Citación: Elenter, J., Etchebarne, G. y Hounie, I. DNAI : Machine learning for genome enabled prediction of complex traits in agriculture [en línea]. Tesis de grado. Montevideo : Udelar. FI. IIE, 2021.
Licencia: Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)
Aparece en las colecciones: Tesis de grado - Instituto de Ingeniería Eléctrica

Ficheros en este ítem:
Fichero Descripción Tamaño Formato   
EEH21.pdfTesis de grado35,18 MBAdobe PDFVisualizar/Abrir


Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons Creative Commons