Systematic elucidation and in vivo validation of sequences enriched in hindbrain transcriptional control

Grzegorz M. Burzynski; Xylena Reed; Leila Taher; Zachary E. Stine; Takeshi Matsui; Ivan Ovcharenko; Andrew S. McCallion

doi:10.1101/gr.139717.112

Systematic elucidation and in vivo validation of sequences enriched in hindbrain transcriptional control

Grzegorz M. Burzynski, Xylena Reed, Leila Taher, Zachary E. Stine, Takeshi Matsui, Ivan Ovcharenko, Andrew S. McCallion

School of Medicine

Research output: Contribution to journal › Article › peer-review

13 Scopus citations

Abstract

Illuminating the primary sequence encryption of enhancers is central to understanding the regulatory architecture of genomes. We have developed a machine learning approach to decipher motif patterns of hindbrain enhancers and identify 40,000 sequences in the human genome that we predict display regulatory control that includes the hindbrain. Consistent with their roles in hindbrain patterning, MEIS1, NKX6-1, as well as HOX and POU family binding motifs contributed strongly to this enhancer model. Predicted hindbrain enhancers are overrepresented at genes expressed in hindbrain and associated with nervous system development, and primarily reside in the areas of open chromatin. In addition, 77 (0.2%) of these predictions are identified as hindbrain enhancers on the VISTA Enhancer Browser, and 26,000 (60%) overlap enhancer marks (H3K4me1 or H3K27ac). To validate these putative hindbrain enhancers, we selected 55 elements distributed throughout our predictions and six low scoring controls for evaluation in a zebrafish transgenic assay. When assayed in mosaic transgenic embryos, 51/55 elements directed expression in the central nervous system. Furthermore, 30/34 (88%) predicted enhancers analyzed in stable zebrafish transgenic lines directed expression in the larval zebrafish hindbrain. Subsequent analysis of sequence fragments selected based upon motif clustering further confirmed the critical role of the motifs contributing to the classifier. Our results demonstrate the existence of a primary sequence code characteristic to hindbrain enhancers. This code can be accurately extracted using machine-learning approaches and applied successfully for de novo identification of hindbrain enhancers. This study represents a critical step toward the dissection of regulatory control in specific neuronal subtypes.

Original language	English (US)
Pages (from-to)	2278-2289
Number of pages	12
Journal	Genome research
Volume	22
Issue number	11
DOIs	https://doi.org/10.1101/gr.139717.112
State	Published - Nov 2012

ASJC Scopus subject areas

Genetics
Genetics(clinical)

Access to Document

10.1101/gr.139717.112

Cite this

@article{e000718aa3594d9baa0c6915db1a410f,

title = "Systematic elucidation and in vivo validation of sequences enriched in hindbrain transcriptional control",

abstract = "Illuminating the primary sequence encryption of enhancers is central to understanding the regulatory architecture of genomes. We have developed a machine learning approach to decipher motif patterns of hindbrain enhancers and identify 40,000 sequences in the human genome that we predict display regulatory control that includes the hindbrain. Consistent with their roles in hindbrain patterning, MEIS1, NKX6-1, as well as HOX and POU family binding motifs contributed strongly to this enhancer model. Predicted hindbrain enhancers are overrepresented at genes expressed in hindbrain and associated with nervous system development, and primarily reside in the areas of open chromatin. In addition, 77 (0.2%) of these predictions are identified as hindbrain enhancers on the VISTA Enhancer Browser, and 26,000 (60%) overlap enhancer marks (H3K4me1 or H3K27ac). To validate these putative hindbrain enhancers, we selected 55 elements distributed throughout our predictions and six low scoring controls for evaluation in a zebrafish transgenic assay. When assayed in mosaic transgenic embryos, 51/55 elements directed expression in the central nervous system. Furthermore, 30/34 (88%) predicted enhancers analyzed in stable zebrafish transgenic lines directed expression in the larval zebrafish hindbrain. Subsequent analysis of sequence fragments selected based upon motif clustering further confirmed the critical role of the motifs contributing to the classifier. Our results demonstrate the existence of a primary sequence code characteristic to hindbrain enhancers. This code can be accurately extracted using machine-learning approaches and applied successfully for de novo identification of hindbrain enhancers. This study represents a critical step toward the dissection of regulatory control in specific neuronal subtypes.",

author = "Burzynski, {Grzegorz M.} and Xylena Reed and Leila Taher and Stine, {Zachary E.} and Takeshi Matsui and Ivan Ovcharenko and McCallion, {Andrew S.}",

year = "2012",

month = nov,

doi = "10.1101/gr.139717.112",

language = "English (US)",

volume = "22",

pages = "2278--2289",

journal = "Genome research",

issn = "1088-9051",

publisher = "Cold Spring Harbor Laboratory Press",

number = "11",

}

TY - JOUR

T1 - Systematic elucidation and in vivo validation of sequences enriched in hindbrain transcriptional control

AU - Burzynski, Grzegorz M.

AU - Reed, Xylena

AU - Taher, Leila

AU - Stine, Zachary E.

AU - Matsui, Takeshi

AU - Ovcharenko, Ivan

AU - McCallion, Andrew S.

PY - 2012/11

Y1 - 2012/11

N2 - Illuminating the primary sequence encryption of enhancers is central to understanding the regulatory architecture of genomes. We have developed a machine learning approach to decipher motif patterns of hindbrain enhancers and identify 40,000 sequences in the human genome that we predict display regulatory control that includes the hindbrain. Consistent with their roles in hindbrain patterning, MEIS1, NKX6-1, as well as HOX and POU family binding motifs contributed strongly to this enhancer model. Predicted hindbrain enhancers are overrepresented at genes expressed in hindbrain and associated with nervous system development, and primarily reside in the areas of open chromatin. In addition, 77 (0.2%) of these predictions are identified as hindbrain enhancers on the VISTA Enhancer Browser, and 26,000 (60%) overlap enhancer marks (H3K4me1 or H3K27ac). To validate these putative hindbrain enhancers, we selected 55 elements distributed throughout our predictions and six low scoring controls for evaluation in a zebrafish transgenic assay. When assayed in mosaic transgenic embryos, 51/55 elements directed expression in the central nervous system. Furthermore, 30/34 (88%) predicted enhancers analyzed in stable zebrafish transgenic lines directed expression in the larval zebrafish hindbrain. Subsequent analysis of sequence fragments selected based upon motif clustering further confirmed the critical role of the motifs contributing to the classifier. Our results demonstrate the existence of a primary sequence code characteristic to hindbrain enhancers. This code can be accurately extracted using machine-learning approaches and applied successfully for de novo identification of hindbrain enhancers. This study represents a critical step toward the dissection of regulatory control in specific neuronal subtypes.

AB - Illuminating the primary sequence encryption of enhancers is central to understanding the regulatory architecture of genomes. We have developed a machine learning approach to decipher motif patterns of hindbrain enhancers and identify 40,000 sequences in the human genome that we predict display regulatory control that includes the hindbrain. Consistent with their roles in hindbrain patterning, MEIS1, NKX6-1, as well as HOX and POU family binding motifs contributed strongly to this enhancer model. Predicted hindbrain enhancers are overrepresented at genes expressed in hindbrain and associated with nervous system development, and primarily reside in the areas of open chromatin. In addition, 77 (0.2%) of these predictions are identified as hindbrain enhancers on the VISTA Enhancer Browser, and 26,000 (60%) overlap enhancer marks (H3K4me1 or H3K27ac). To validate these putative hindbrain enhancers, we selected 55 elements distributed throughout our predictions and six low scoring controls for evaluation in a zebrafish transgenic assay. When assayed in mosaic transgenic embryos, 51/55 elements directed expression in the central nervous system. Furthermore, 30/34 (88%) predicted enhancers analyzed in stable zebrafish transgenic lines directed expression in the larval zebrafish hindbrain. Subsequent analysis of sequence fragments selected based upon motif clustering further confirmed the critical role of the motifs contributing to the classifier. Our results demonstrate the existence of a primary sequence code characteristic to hindbrain enhancers. This code can be accurately extracted using machine-learning approaches and applied successfully for de novo identification of hindbrain enhancers. This study represents a critical step toward the dissection of regulatory control in specific neuronal subtypes.

UR - http://www.scopus.com/inward/record.url?scp=84868307469&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84868307469&partnerID=8YFLogxK

U2 - 10.1101/gr.139717.112

DO - 10.1101/gr.139717.112

M3 - Article

C2 - 22759862

AN - SCOPUS:84868307469

SN - 1088-9051

VL - 22

SP - 2278

EP - 2289

JO - Genome research

JF - Genome research

IS - 11

ER -

Systematic elucidation and in vivo validation of sequences enriched in hindbrain transcriptional control

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this