A data-mining framework for large scale analysis of dose-outcome relationships in a database of irradiated head and neck cancer patients

Scott P. Robertson, Harry Quon, Ana P. Kiess, Joseph A. Moore, Wuyang Yang, Zhi Cheng, Sarah Afonso, Mysha Allen, Marian Richardson, Amanda Choflet, Andrew Sharabi, Todd R. McNutt

Research output: Contribution to journalArticlepeer-review

32 Scopus citations


Purpose: To develop a hypothesis-generating framework for automatic extraction of dose-outcome relationships from an in-house, analytic oncology database. Methods: Dose-volume histograms (DVH) and clinical outcomes have been routinely stored to the authors' database for 684 head and neck cancer patients treated from 2007 to 2014. Database queries were developed to extract outcomes that had been assessed for at least 100 patients, as well as DVH curves for organs-at-risk (OAR) that were contoured for at least 100 patients. DVH curves for paired OAR (e.g., left and right parotids) were automatically combined and included as additional structures for analysis. For each OAR-outcome combination, only patients with both OAR and outcome records were analyzed. DVH dose points, D(Vt), at a given normalized volume threshold Vt were stratified into two groups based on severity of toxicity outcomes after treatment completion. The probability of an outcome was modeled at each Vt = [0%,1%,⋯,100%] by logistic regression. Notable OAR-outcome combinations were defined as having statistically significant regression parameters (p < 0.05) and an odds ratio of at least 1.05 (5% increase in odds per Gy). Results: A total of 57 individual and combined structures and 97 outcomes were queried from the database. Of all possible OAR-outcome combinations, 17% resulted in significant logistic regression fits (p < 0.05) having an odds ratio of at least 1.05. Further manual inspection revealed a number of reasonable models based on either reported literature or proximity between neighboring OARs. The data-mining algorithm confirmed the following well-known OAR-dose/outcome relationships: dysphagia/larynx, voice changes/larynx, esophagitis/esophagus, xerostomia/parotid glands, and mucositis/oral mucosa. Several surrogate relationships, defined as OAR not directly attributed to an outcome, were also observed, including esophagitis/larynx, mucositis/mandible, and xerostomia/mandible. Conclusions: Prospective collection of clinical data has enabled large-scale analysis of dose-outcome relationships. The current data-mining framework revealed both known and novel dosimetric and clinical relationships, underscoring the potential utility of this analytic approach in hypothesis generation. Multivariate models and advanced, 3D dosimetric features may be necessary to further evaluate the complex relationship between neighboring OAR and observed outcomes.

Original languageEnglish (US)
Pages (from-to)4329-4337
Number of pages9
JournalMedical physics
Issue number7
StatePublished - Jul 1 2015


  • dose-outcome modeling
  • head and neck cancer
  • large-scale analytics
  • toxicity

ASJC Scopus subject areas

  • Biophysics
  • Radiology Nuclear Medicine and imaging


Dive into the research topics of 'A data-mining framework for large scale analysis of dose-outcome relationships in a database of irradiated head and neck cancer patients'. Together they form a unique fingerprint.

Cite this