TY - JOUR
T1 - Differential expression of single-cell RNA-seq data using Tweedie models
AU - Mallick, Himel
AU - Chatterjee, Suvo
AU - Chowdhury, Shrabanti
AU - Chatterjee, Saptarshi
AU - Rahnavard, Ali
AU - Hicks, Stephanie C.
N1 - Publisher Copyright:
© 2022 John Wiley & Sons Ltd. This article has been contributed to by U.S. Government employees and their work is in the public domain in the USA.
PY - 2022/8/15
Y1 - 2022/8/15
N2 - The performance of computational methods and software to identify differentially expressed features in single-cell RNA-sequencing (scRNA-seq) has been shown to be influenced by several factors, including the choice of the normalization method used and the choice of the experimental platform (or library preparation protocol) to profile gene expression in individual cells. Currently, it is up to the practitioner to choose the most appropriate differential expression (DE) method out of over 100 DE tools available to date, each relying on their own assumptions to model scRNA-seq expression features. To model the technological variability in cross-platform scRNA-seq data, here we propose to use Tweedie generalized linear models that can flexibly capture a large dynamic range of observed scRNA-seq expression profiles across experimental platforms induced by platform- and gene-specific statistical properties such as heavy tails, sparsity, and gene expression distributions. We also propose a zero-inflated Tweedie model that allows zero probability mass to exceed a traditional Tweedie distribution to model zero-inflated scRNA-seq data with excessive zero counts. Using both synthetic and published plate- and droplet-based scRNA-seq datasets, we perform a systematic benchmark evaluation of more than 10 representative DE methods and demonstrate that our method (Tweedieverse) outperforms the state-of-the-art DE approaches across experimental platforms in terms of statistical power and false discovery rate control. Our open-source software (R/Bioconductor package) is available at https://github.com/himelmallick/Tweedieverse.
AB - The performance of computational methods and software to identify differentially expressed features in single-cell RNA-sequencing (scRNA-seq) has been shown to be influenced by several factors, including the choice of the normalization method used and the choice of the experimental platform (or library preparation protocol) to profile gene expression in individual cells. Currently, it is up to the practitioner to choose the most appropriate differential expression (DE) method out of over 100 DE tools available to date, each relying on their own assumptions to model scRNA-seq expression features. To model the technological variability in cross-platform scRNA-seq data, here we propose to use Tweedie generalized linear models that can flexibly capture a large dynamic range of observed scRNA-seq expression profiles across experimental platforms induced by platform- and gene-specific statistical properties such as heavy tails, sparsity, and gene expression distributions. We also propose a zero-inflated Tweedie model that allows zero probability mass to exceed a traditional Tweedie distribution to model zero-inflated scRNA-seq data with excessive zero counts. Using both synthetic and published plate- and droplet-based scRNA-seq datasets, we perform a systematic benchmark evaluation of more than 10 representative DE methods and demonstrate that our method (Tweedieverse) outperforms the state-of-the-art DE approaches across experimental platforms in terms of statistical power and false discovery rate control. Our open-source software (R/Bioconductor package) is available at https://github.com/himelmallick/Tweedieverse.
KW - Tweedie distribution
KW - differential expression
KW - exponential dispersion model
KW - generalized linear model
KW - single-cell RNA-sequencing
KW - zero-inflation
UR - http://www.scopus.com/inward/record.url?scp=85131152537&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131152537&partnerID=8YFLogxK
U2 - 10.1002/sim.9430
DO - 10.1002/sim.9430
M3 - Article
C2 - 35656596
AN - SCOPUS:85131152537
SN - 0277-6715
VL - 41
SP - 3492
EP - 3510
JO - Statistics in Medicine
JF - Statistics in Medicine
IS - 18
ER -