Reliability of grading of vesicoureteral reflux and other findings on voiding cystourethrography

Anthony J. Schaeffer, Saul P. Greenfield, Anastasia Ivanova, Gang Cui, J. Michael Zerin, Jeanne S. Chow, Alejandro Hoberman, Ranjiv I. Mathews, Tej K. Mattoo, Myra A. Carpenter, Marva Moxey-Mims, Russell W. Chesney, Caleb P. Nelson

Research output: Contribution to journalArticlepeer-review

20 Scopus citations


Introduction: Voiding cystourethrography (VCUG) is the modality of choice to diagnose vesicoureteral reflux (VUR). Although grading of VUR is essential for prognosis and clinical decision-making, the inter-observer reliability for grading has been shown to vary substantially. The Randomized Intervention for Children with VesicoUreteral Reflux (RIVUR) trial provides a large cohort of children with VUR to better understand the reliability of VCUG findings. Objective: To determine the inter-observer consistency of the grade of VUR and other VCUG findings in a large cohort of children with VUR. Study design: The RIVUR trial is a randomized controlled trial of antimicrobial prophylaxis in children with VUR diagnosed after UTI. Each enrollment VCUG was read by a local clinical (i.e. non-reference) radiologist, and independently by two blinded RIVUR reference radiologists. Reference radiologists' disagreements were adjudicated for trial purposes. The grade of VUR and other VCUG findings were extracted from the local clinical radiologist's report. The unit of analysis included individual ureters and individual participants. We compared the three interpretations for grading of VUR and other VCUG findings to determine the inter-observer reliability. Results: Six-hundred and two non-reference radiology reports from 90 institutions were reviewed and yielded the grade of VUR for 560 left and 524 right ureters. All three radiologists agreed on VUR grade in only 59% of ureters; two of three agreed on 39% of ureters; and all three disagreed on 2% of ureters (Table). Agreement was better (≥92%) for other VCUG findings (e.g. bladder shape "normal"). The non-reference radiologists' grade of VUR differed from the reference radiologists' adjudicated grade by exactly one grade level in 19% of ureters, and by two or more grade levels in 2.2% of ureters. When the participant was the unit of analysis, all three radiologists agreed on the grade of VUR in both ureters in just 43% of cases. Discussion: Our study shows considerable and clinically relevant variability in grading VUR by VCUG. This variability was consistent when comparing non-reference to the adjudicated reference radiologists' assessment and the reference radiologists to each other. This study was limited to children with a history of UTI and grade I-IV VUR and may not be generalizable to all children who have a VCUG. Conclusion: The considerable inter-observer variability in VUR grading has both research and clinical implications, as study design, risk stratification, and clinical decision-making rely heavily on grades of VUR. . TableStudy summary. TableCharacteristicNo. of VCUG reports analyzed602Gender of participants Male49 Female553Age in months at time of VCUG (median) [IQR]11[5,30]No. of ureters analyzed1081Reflux grade agreement Between non-reference and each reference radiologist (three-way) All three agree638/1081(59%) Two agree, one disagree417/1081(39%) All three disagree27(2%) Between non-reference and adjudicated reference radiologists' score (two-way) Agree805(75%) Disagree275(25%) Kappa (95% CI)0.66(0.62-0.69) . .

Original languageEnglish (US)
JournalJournal of Pediatric Urology
StateAccepted/In press - Mar 10 2016


  • Classification
  • Concordance
  • Radiology
  • Urinary tract infection
  • Vesico-ureteral reflux
  • Voiding cystourethrogram

ASJC Scopus subject areas

  • Pediatrics, Perinatology, and Child Health
  • Urology


Dive into the research topics of 'Reliability of grading of vesicoureteral reflux and other findings on voiding cystourethrography'. Together they form a unique fingerprint.

Cite this