TY - GEN
T1 - Advances in computational identification and modeling of DNA regulatory elements in the human genome
AU - Lee, Dongwon
AU - Beer, Michael A.
PY - 2013
Y1 - 2013
N2 - Identification of DNA regulatory elements in the human genome remains a significant challenge. Variation in these regulatory elements can contribute to disease in many ways by altering protein levels. Enhancers constitute an important class of these DNA regulatory elements, and a major component of current research is focused on a more complete understanding of enhancer function and improved techniques for enhancer detection. We recently developed a computational approach to identify enhancers from primary DNA sequence using a support vector machine (kmer-SVM) framework. Here we show that the kmer-SVM model can accurately predict tissue specific enhancer activity without any prior knowledge about TF binding sites. We adapt this approach to predict genomic TF binding data generated by the ENCODE project, showing that genomic MYC binding can be accurately predicted from local DNA sequence with the kmer-SVM. We find similar accuracy with an SVM using PWMs representing known TF binding specificities. By integrating Chip-seq and expression data, we show that while much of MYC binding is shared between ENCODE cell types and is promoter proximal, cell-type specific MYC binding is distal and is correlated with enhanced cell-specific expression of nearby (~50kb) genes. The distinction between shared and cell-specific MYC binding is determined by DNA sequence variation around the canonical MYC binding site, which by itself cannot distinguish cell-specific binding events. These results suggest that tissue specific enhancer activity is specified by primary DNA sequence, that local sequence context controls tissue specific activity through cooperative TF interactions, and that local context sequence features can be identified from genomic binding data.
AB - Identification of DNA regulatory elements in the human genome remains a significant challenge. Variation in these regulatory elements can contribute to disease in many ways by altering protein levels. Enhancers constitute an important class of these DNA regulatory elements, and a major component of current research is focused on a more complete understanding of enhancer function and improved techniques for enhancer detection. We recently developed a computational approach to identify enhancers from primary DNA sequence using a support vector machine (kmer-SVM) framework. Here we show that the kmer-SVM model can accurately predict tissue specific enhancer activity without any prior knowledge about TF binding sites. We adapt this approach to predict genomic TF binding data generated by the ENCODE project, showing that genomic MYC binding can be accurately predicted from local DNA sequence with the kmer-SVM. We find similar accuracy with an SVM using PWMs representing known TF binding specificities. By integrating Chip-seq and expression data, we show that while much of MYC binding is shared between ENCODE cell types and is promoter proximal, cell-type specific MYC binding is distal and is correlated with enhanced cell-specific expression of nearby (~50kb) genes. The distinction between shared and cell-specific MYC binding is determined by DNA sequence variation around the canonical MYC binding site, which by itself cannot distinguish cell-specific binding events. These results suggest that tissue specific enhancer activity is specified by primary DNA sequence, that local sequence context controls tissue specific activity through cooperative TF interactions, and that local context sequence features can be identified from genomic binding data.
KW - computational biology
KW - enhancers
KW - genomics
KW - transcriptional regulation
UR - http://www.scopus.com/inward/record.url?scp=84871572524&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84871572524&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-32183-2_81
DO - 10.1007/978-3-642-32183-2_81
M3 - Conference contribution
AN - SCOPUS:84871572524
SN - 9783642321825
T3 - IFMBE Proceedings
SP - 328
EP - 331
BT - 4th International Conference on Biomedical Engineering in Vietnam
T2 - 4th International Conference on the Development of Biomedical Engineering in Vietnam
Y2 - 8 January 2012 through 10 January 2012
ER -