Differential expression of single-cell RNA-seq data using Tweedie models

Himel Mallick, Suvo Chatterjee, Shrabanti Chowdhury, Saptarshi Chatterjee, Ali Rahnavard, Stephanie C. Hicks

Research output: Contribution to journalArticlepeer-review

Abstract

The performance of computational methods and software to identify differentially expressed features in single-cell RNA-sequencing (scRNA-seq) has been shown to be influenced by several factors, including the choice of the normalization method used and the choice of the experimental platform (or library preparation protocol) to profile gene expression in individual cells. Currently, it is up to the practitioner to choose the most appropriate differential expression (DE) method out of over 100 DE tools available to date, each relying on their own assumptions to model scRNA-seq expression features. To model the technological variability in cross-platform scRNA-seq data, here we propose to use Tweedie generalized linear models that can flexibly capture a large dynamic range of observed scRNA-seq expression profiles across experimental platforms induced by platform- and gene-specific statistical properties such as heavy tails, sparsity, and gene expression distributions. We also propose a zero-inflated Tweedie model that allows zero probability mass to exceed a traditional Tweedie distribution to model zero-inflated scRNA-seq data with excessive zero counts. Using both synthetic and published plate- and droplet-based scRNA-seq datasets, we perform a systematic benchmark evaluation of more than 10 representative DE methods and demonstrate that our method (Tweedieverse) outperforms the state-of-the-art DE approaches across experimental platforms in terms of statistical power and false discovery rate control. Our open-source software (R/Bioconductor package) is available at https://github.com/himelmallick/Tweedieverse.

Original languageEnglish (US)
Pages (from-to)3492-3510
Number of pages19
JournalStatistics in Medicine
Volume41
Issue number18
DOIs
StatePublished - Aug 15 2022

Keywords

  • Tweedie distribution
  • differential expression
  • exponential dispersion model
  • generalized linear model
  • single-cell RNA-sequencing
  • zero-inflation

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Differential expression of single-cell RNA-seq data using Tweedie models'. Together they form a unique fingerprint.

Cite this