Dr. Hai Wang presents Buckler Lab research on Deep Learning of Genomic Variation and Regulatory Network at the upcoming 2019 PAG conference.
Discovery of genetic variation underlying quantitative traits is a key challenge for population genetics
To tackle the problem of uncovering the genetic variation that produces quantitative traits, the team at the Buckler Lab designed a deep learning framework which includes: 1) de novo prediction of endophenotypes from genetic variants by deep learning models; and 2) leverage of predicted endophenotypes in the prediction of important agronomic traits.
We propose that this framework constitutes an important complementation to the traditional association studies in three aspects:
1) The models predicting endophenotypes are in essence simulations of molecular biological processes, they are effective at not only common alleles, but also rare alleles or even novel alleles that the models have never seen before.
2) Deep learning models enable in silico breakage of LD, thereby facilitating prediction of causal variants even in the presence of strong LD.
3) By training multiple models on diverse molecular mechanisms modulating endophenotypes, it is possible to gain a mechanistic understanding of causal variants.
Figure 1 below shows the pseudogene model, which takes promoter and/or terminator sequences as the predictor to predict binary expression levels:
Graphic a is a schematic representation of the architecture of the pseudogene model.
Graphic b is the distribution of log-transformed maximal TPMs for all maize genes for 422 RNA-Seq samples. Genes are categorized into unexpressed genes (blue), moderately-expressed genes (green), and highly-expressed genes (red).
Graphic c displays the accuracy and auROC of the pseudogene model trained on the Off/On gene set and the Off/High gene set, using promoters, terminators, or both promoter and terminator sequences as predictors.
The advantages of this deep learning framework discussed above, combined with high-throughput genotyping/phenotyping techniques and genome editing, are both helpful for the upcoming "Breeding 4.0" era when beneficial genetic variants are rationally combined with unprecedented efficiency.