Predicting enhancer activity and variant impact using gkm-SVM

Research output: Contribution to journalArticlepeer-review

14 Scopus citations


We participated in the Critical Assessment of Genome Interpretation eQTL challenge to further test computational models of regulatory variant impact and their association with human disease. Our prediction model is based on a discriminative gapped-kmer SVM (gkm-SVM) trained on genome-wide chromatin accessibility data in the cell type of interest. The comparisons with massively parallel reporter assays (MPRA) in lymphoblasts show that gkm-SVM is among the most accurate prediction models even though all other models used the MPRA data for model training, and gkm-SVM did not. In addition, we compare gkm-SVM with other MPRA datasets and show that gkm-SVM is a reliable predictor of expression and that deltaSVM is a reliable predictor of variant impact in K562 cells and mouse retina. We further show that DHS (DNase-I hypersensitive sites) and ATAC-seq (assay for transposase-accessible chromatin using sequencing) data are equally predictive substrates for training gkm-SVM, and that DHS regions flanked by H3K27Ac and H3K4me1 marks are more predictive than DHS regions alone.

Original languageEnglish (US)
Pages (from-to)1251-1258
Number of pages8
JournalHuman mutation
Issue number9
StatePublished - Sep 2017


  • MPRA
  • eQTL analysis
  • enhancers
  • gene regulation
  • machine learning
  • regulatory variation

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)


Dive into the research topics of 'Predicting enhancer activity and variant impact using gkm-SVM'. Together they form a unique fingerprint.

Cite this