## Abstract

In a modern observational study based on healthcare databases, the number of observations and of predictors typically range in the order of 10^{5}–10^{6} and of 10^{4}–10^{5}. Despite the large sample size, data rarely provide sufficient information to reliably estimate such a large number of parameters. Sparse regression techniques provide potential solutions, one notable approach being the Bayesian method based on shrinkage priors. In the “large n and large p” setting, however, the required posterior computation encounters a bottleneck at repeated sampling from a high-dimensional Gaussian distribution, whose precision matrix (Formula presented.) is expensive to compute and factorize. In this article, we present a novel algorithm to speed up this bottleneck based on the following observation: We can cheaply generate a random vector b such that the solution to the linear system (Formula presented.) has the desired Gaussian distribution. We can then solve the linear system by the conjugate gradient (CG) algorithm through matrix-vector multiplications by (Formula presented.); this involves no explicit factorization or calculation of (Formula presented.) itself. Rapid convergence of CG in this context is guaranteed by the theory of prior-preconditioning we develop. We apply our algorithm to a clinically relevant large-scale observational study with (Formula presented.) patients and (Formula presented.) clinical covariates, designed to assess the relative risk of adverse events from two alternative blood anti-coagulants. Our algorithm demonstrates an order of magnitude speed-up in posterior inference, in our case cutting the computation time from two weeks to less than a day. Supplementary materials for this article are available online.

Original language | English (US) |
---|---|

Journal | Journal of the American Statistical Association |

DOIs | |

State | Accepted/In press - 2022 |

## Keywords

- Big data
- Conjugate gradient
- Markov chain Monte Carlo
- Numerical linear algebra
- Sparse matrix
- Variable selection

## ASJC Scopus subject areas

- Statistics and Probability
- Statistics, Probability and Uncertainty