A large-scale quantitative analysis of latent factors and sentiment in online doctor reviews

Byron C. Wallace, Michael J. Paul, Urmimala Sarkar, Thomas A. Trikalinos, Mark Dredze

Research output: Contribution to journalArticlepeer-review


Online physician reviews are a massive and potentially rich source of information capturing patient sentiment regarding healthcare. We analyze a corpus comprising nearly 60 000 such reviews with a state-of-the-art probabilistic model of text. We describe a probabilistic generative model that captures latent sentiment across aspects of care (eg, interpersonal manner). We target specific aspects by leveraging a small set of manually annotated reviews. We perform regression analysis to assess whether model output improves correlation with state-level measures of healthcare. We report both qualitative and quantitative results. Model output correlates with state-level measures of quality healthcare, including patient likelihood of visiting their primary care physician within 14 days of discharge (p=0.03), and using the proposed model better predicts this outcome (p=0.10). We find similar results for healthcare expenditure. Generative models of text can recover important information from online physician reviews, facilitating large-scale analyses of such reviews.

Original languageEnglish (US)
Pages (from-to)1098-1103
Number of pages6
JournalJournal of the American Medical Informatics Association
Issue number6
StatePublished - Jun 10 2014

ASJC Scopus subject areas

  • Health Informatics


Dive into the research topics of 'A large-scale quantitative analysis of latent factors and sentiment in online doctor reviews'. Together they form a unique fingerprint.

Cite this