Development and validation of a deep learning model to quantify interstitial fibrosis and tubular atrophy from kidney ultrasonography images

Ambarish M. Athavale, Peter D. Hart, Mathew Itteera, David Cimbaluk, Tushar Patel, Anas Alabkaa, Jose Arruda, Ashok Singh, Avi Rosenberg, Hemant Kulkarni

Research output: Contribution to journalArticlepeer-review


Importance: Interstitial fibrosis and tubular atrophy (IFTA) is a strong indicator of decline in kidney function and is measured using histopathological assessment of kidney biopsy core. At present, a noninvasive test to assess IFTA is not available. Objective: To develop and validate a deep learning (DL) algorithm to quantify IFTA from kidney ultrasonography images. Design, Setting, and Participants: This was a single-center diagnostic study of consecutive patients who underwent native kidney biopsy at John H. Stroger Jr. Hospital of Cook County, Chicago, Illinois, between January 1, 2014, and December 31, 2018. A DL algorithm was trained, validated, and tested to classify IFTA from kidney ultrasonography images. Of 6135 Crimmins-filtered ultrasonography images, 5523 were used for training (5122 images) and validation (401 images), and 612 were used to test the accuracy of the DL system. Kidney segmentation was performed using the UNet architecture, and classification was performed using a convolution neural network-based feature extractor and extreme gradient boosting. IFTA scored by a nephropathologist on trichrome stained kidney biopsy slide was used as the reference standard. IFTA was divided into 4 grades (grade 1, 0%-24%; grade 2, 25%-49%; grade 3, 50%-74%; and grade 4, 75%-100%). Data analysis was performed from December 2019 to May 2020. Main Outcomes and Measures: Prediction of IFTA grade was measured using the metrics precision, recall, accuracy, and F1 score. Results: This study included 352 patients (mean [SD] age 47.43 [14.37] years), of whom 193 (54.82%) were women. There were 159 patients with IFTA grade 1 (2701 ultrasonography images), 74 patients with IFTA grade 2 (1239 ultrasonography images), 41 patients with IFTA grade 3 (701 ultrasonography images), and 78 patients with IFTA grade 4 (1494 ultrasonography images). Kidney ultrasonography images were segmented with 91% accuracy. In the independent test set, the point estimates for performance matrices showed precision of 0.8927 (95% CI, 0.8682-0.9172), recall of 0.8037 (95% CI, 0.7722-0.8352), accuracy of 0.8675 (95% CI, 0.8406-0.8944), and an F1 score of 0.8389 (95% CI, 0.8098-0.8680) at the image level. Corresponding estimates at the patient level were precision of 0.9003 (95% CI, 0.8644-0.9362), recall of 0.8421 (95% CI, 0.7984-0.8858), accuracy of 0.8955 (95% CI, 0.8589-0.9321), and an F1 score of 0.8639 (95% CI, 0.8228-0.9049). Accuracy at the patient level was highest for IFTA grade 1 and IFTA grade 4. The accuracy (approximately 90%) remained high irrespective of the timing of ultrasonography studies and the biopsy diagnosis. The predictive performance of the DL system did not show significant improvement when combined with baseline clinical characteristics. Conclusions and Relevance: These findings suggest that a DL algorithm can accurately and independently predict IFTA from kidney ultrasonography images.

Original languageEnglish (US)
Article number11176
JournalJAMA Network Open
StateAccepted/In press - 2021
Externally publishedYes

ASJC Scopus subject areas

  • Medicine(all)


Dive into the research topics of 'Development and validation of a deep learning model to quantify interstitial fibrosis and tubular atrophy from kidney ultrasonography images'. Together they form a unique fingerprint.

Cite this