Functionally-informed fine mapping

Most loci highlighted by genome-wide association studies for complex traits cannot be pinpointed to causal variant(s). They also tend to lack protein coding explanation. In addition to increasing sample sizes and looking across populations, we believe that functional annotations can improve the fine-mapping process. The machine learning methods for DNA sequence analysis described below train models that can produce variant effect predictions for diverse sets of cell type specific regulatory functions. In initial work, we demonstrated how these predictions can be fed into a meta-predictor to discriminate causal eQTLs (Wang et al. 2021). We hope to extend this work to fine-map general phenotype associations, particularly for aging-related disease.


Single Cell Genomics Across Lifespans

To dissect the causal regulatory networks that go awry in aging, we seek high-throughput, rich and expressive profiles of cellular state. We’ve applied profiled several tissues in young and old mice using single cell genomics and explored the role of cell identity and tissue environment in the aging process (Kimmel et al. 2019). In follow up work, we hypothesized that asking the cells to perform a function such as differentiation would reveal additional insights, which proved true (Kimmel et al. 2021). Moving forward, we’ll continue to acquire such profiles across organisms with this rapidly advancing technology in denser time series, diverse genetic backgrounds, perturbed settings, and paired with deeper physiological phenotyping.


Deep convolutional neural networks for dNA sequence analysis

Convolutional neural networks (CNNs) are a powerful machine learning tool for learning representations of DNA sequences that model functional activity. I developed a package called Basset, where I demonstrated CNNs are far more accurate than previous motif or k-mer based approaches for predicting cell type-specific DNaseI hypersensitivity (Kelley et al. 2016). Subsequently, I developed a modified framework called Basenji, where I extended CNN predictions to very long sequences to better consider distal interactions between regulatory elements (Kelley et al. 2018). In our latest iteration, called the Enformer, we made use of self-attention layers to extend the visibility to long-range regulatory interactions (Avec et al. 2021).


Transposable element influence on regulatory evolution

Transposable elements (TEs) are DNA sequences capable of copying to new genomic locations. Long considered a genomic parasite and distributor of junk DNA, TEs appear now to be an important method by which genomes change. The distribution of a similar sequence has certain perks for regulatory network evolution, and we're now discovering many specific examples. While exploring the relationship between TEs and long noncoding RNA genes (lncRNAs), I discovered a family of stem cell-specific lncRNAs driven by a TE family called HERVH (Kelley, Rinn 2012). I've also studied the role of TEs in distributing RNA binding protein motifs throughout mRNAs and lncRNAs, particularly HuR binding sites in Alu elements (Kelley et al. 2014). As we better understand mammalian gene regulation, I expect there will be much more to say about the role of TEs in the evolution of regulatory sequence.


Roles for RNA-protein interactions in chromatin dynamics

Mammalian genomes are nearly ubiquitously transcribed into RNA, especially at active regulatory elements such as enhancers, but mechanistic roles for these noncoding RNAs have been challenging to exactly pin down. Long noncoding RNAs (lncRNAs) and shorter enhancer RNAs show minimal conservation of sequence or structure and change rather freely. We and others have shown that many chromatin-associated proteins bind these RNAs non-specifically, and these interactions can influence chromatin (Hendrickson/Kelley et al. 2016). To my eye, this line of research offers the most promising avenue to simultaneously explain the absence of sequence conservation but apparent functional roles for lncRNAs in regulating gene expression.