TY - JOUR
T1 - StereoGene
T2 - Rapid estimation of genome-wide correlation of continuous or interval feature data
AU - Stavrovskaya, Elena D.
AU - Niranjan, Tejasvi
AU - Fertig, Elana J.
AU - Wheelan, Sarah J.
AU - Favorov, Alexander V.
AU - Mironov, Andrey A.
N1 - Funding Information:
This work was supported by Russian Science Foundation [grant 14-24-00155]; by National Institutes of Health [grants P30 CA006973, NCI R01CA177669]; by Allegheny Health Network-Johns Hopkins Cancer Research Fund and JHU IDIES/Moore Foundation and by Russian Foundation for Basic Research [grants 14-04-01872, 14-04-00576].
Publisher Copyright:
© The Author 2017. Published by Oxford University Press. All rights reserved.
PY - 2017/10/15
Y1 - 2017/10/15
N2 - Motivation Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Results Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. Availability and implementation The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/ Contact favorov@sensi.org Supplementary informationSupplementary dataare available at Bioinformatics online.
AB - Motivation Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Results Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. Availability and implementation The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/ Contact favorov@sensi.org Supplementary informationSupplementary dataare available at Bioinformatics online.
UR - http://www.scopus.com/inward/record.url?scp=85031784666&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85031784666&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btx379
DO - 10.1093/bioinformatics/btx379
M3 - Article
C2 - 29028265
AN - SCOPUS:85031784666
SN - 1367-4803
VL - 33
SP - 3158
EP - 3165
JO - Bioinformatics
JF - Bioinformatics
IS - 20
ER -