Accurate assignment of disease liability to genetic variants using only population data

Research output: Contribution to journalArticlepeer-review

Abstract

Purpose: The growing size of public variant repositories prompted us to test the accuracy of pathogenicity prediction of DNA variants using population data alone. Methods: Under the a priori assumption that the ratio of the prevalence of variants in healthy population vs that in affected populations form 2 distinct distributions (pathogenic and benign), we used a Bayesian method to assign probability to a variant belonging to either distribution. Results: The approach, termed Bayesian prevalence ratio (BayPR), accurately parsed 300 of 313 expertly curated CFTR variants: 284 of 296 pathogenic/likely pathogenic variants in 1 distribution and 16 of 17 benign/likely benign variants in another. BayPR produced an area under the receiver operating characteristic curve of 0.99 for 103 functionally confirmed missense CFTR variants, which is equal to or exceeds 10 commonly used algorithms (area under the receiver operating characteristic curve range = 0.54-0.99). Application of BayPR to expertly curated variants in 8 genes associated with 7 Mendelian conditions led to the assignment of a disease-causing probability of ≥80% to 1350 of 1374 (98.3%) pathogenic/likely pathogenic variants and of ≤20% to 22 of 23 (95.7%) benign/likely benign variants. Conclusion: Irrespective of the variant type or functional effect, the BayPR approach provides probabilities of pathogenicity for DNA variants responsible for Mendelian disorders using only the variant counts in affected and unaffected population samples.

Original languageEnglish (US)
Pages (from-to)87-99
Number of pages13
JournalGenetics in Medicine
Volume24
Issue number1
DOIs
StatePublished - Jan 2022

Keywords

  • Bayesian analysis
  • Population frequency
  • Prevalence ratio
  • Variant classification
  • Variant interpretation

ASJC Scopus subject areas

  • Genetics(clinical)

Fingerprint

Dive into the research topics of 'Accurate assignment of disease liability to genetic variants using only population data'. Together they form a unique fingerprint.

Cite this