Deep convolutional neural networks for dNA sequence analysis

Convolutional neural networks (CNNs) are a powerful machine learning tool for learning representations of DNA sequences that model functional activity. I developed a package called Basset, where I demonstrated CNNs are far more accurate than previous motif or k-mer based approaches for predicting cell type-specific DNaseI hypersensitivity (Kelley et al. 2016). Moving forward, I believe CNNs will be important first-level components of models for gene regulation.

Transposable element influence on regulatory evolution

Transposable elements (TEs) are DNA sequences capable of copying to new genomic locations. Long considered a genomic parasite and distributor of junk DNA, TEs appear now to be an important method by which genomes change. The distribution of a similar sequence has certain perks for regulatory network evolution, and we're now discovering many specific examples. While exploring the relationship between TEs and long noncoding RNA genes (lncRNAs), I discovered a family of stem cell-specific lncRNAs driven by a TE family called HERVH (Kelley, Rinn 2012). I've also studied the role of TEs in distributing RNA binding protein motifs throughout mRNAs and lncRNAs, particularly HuR binding sites in Alu elements (Kelley et al. 2014). As we better understand mammalian gene regulation, I expect there will be much more to say about the role of TEs in the evolution of regulatory sequence.

Roles for RNA-protein interactions in chromatin dynamics

Mammalian genomes are nearly ubiquitously transcribed into RNA, especially at active regulatory elements such as enhancers, but mechanistic roles for these noncoding RNAs have been challenging to exactly pin down. Long noncoding RNAs (lncRNAs) and shorter enhancer RNAs show minimal conservation of sequence or structure and change rather freely. We and others have shown that many chromatin-associated proteins bind these RNAs non-specifically, and these interactions can influence chromatin (Hendrickson/Kelley et al. 2016). To my eye, this line of research offers the most promising avenue to simultaneously explain the absence of sequence conservation but apparent functional roles for lncRNAs in regulating gene expression.