TY - JOUR
T1 - Development and validation of a deep learning model to quantify interstitial fibrosis and tubular atrophy from kidney ultrasonography images
AU - Athavale, Ambarish M.
AU - Hart, Peter D.
AU - Itteera, Mathew
AU - Cimbaluk, David
AU - Patel, Tushar
AU - Alabkaa, Anas
AU - Arruda, Jose
AU - Singh, Ashok
AU - Rosenberg, Avi
AU - Kulkarni, Hemant
N1 - Publisher Copyright:
© 2021 American Medical Association. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Importance: Interstitial fibrosis and tubular atrophy (IFTA) is a strong indicator of decline in kidney function and is measured using histopathological assessment of kidney biopsy core. At present, a noninvasive test to assess IFTA is not available. Objective: To develop and validate a deep learning (DL) algorithm to quantify IFTA from kidney ultrasonography images. Design, Setting, and Participants: This was a single-center diagnostic study of consecutive patients who underwent native kidney biopsy at John H. Stroger Jr. Hospital of Cook County, Chicago, Illinois, between January 1, 2014, and December 31, 2018. A DL algorithm was trained, validated, and tested to classify IFTA from kidney ultrasonography images. Of 6135 Crimmins-filtered ultrasonography images, 5523 were used for training (5122 images) and validation (401 images), and 612 were used to test the accuracy of the DL system. Kidney segmentation was performed using the UNet architecture, and classification was performed using a convolution neural network-based feature extractor and extreme gradient boosting. IFTA scored by a nephropathologist on trichrome stained kidney biopsy slide was used as the reference standard. IFTA was divided into 4 grades (grade 1, 0%-24%; grade 2, 25%-49%; grade 3, 50%-74%; and grade 4, 75%-100%). Data analysis was performed from December 2019 to May 2020. Main Outcomes and Measures: Prediction of IFTA grade was measured using the metrics precision, recall, accuracy, and F1 score. Results: This study included 352 patients (mean [SD] age 47.43 [14.37] years), of whom 193 (54.82%) were women. There were 159 patients with IFTA grade 1 (2701 ultrasonography images), 74 patients with IFTA grade 2 (1239 ultrasonography images), 41 patients with IFTA grade 3 (701 ultrasonography images), and 78 patients with IFTA grade 4 (1494 ultrasonography images). Kidney ultrasonography images were segmented with 91% accuracy. In the independent test set, the point estimates for performance matrices showed precision of 0.8927 (95% CI, 0.8682-0.9172), recall of 0.8037 (95% CI, 0.7722-0.8352), accuracy of 0.8675 (95% CI, 0.8406-0.8944), and an F1 score of 0.8389 (95% CI, 0.8098-0.8680) at the image level. Corresponding estimates at the patient level were precision of 0.9003 (95% CI, 0.8644-0.9362), recall of 0.8421 (95% CI, 0.7984-0.8858), accuracy of 0.8955 (95% CI, 0.8589-0.9321), and an F1 score of 0.8639 (95% CI, 0.8228-0.9049). Accuracy at the patient level was highest for IFTA grade 1 and IFTA grade 4. The accuracy (approximately 90%) remained high irrespective of the timing of ultrasonography studies and the biopsy diagnosis. The predictive performance of the DL system did not show significant improvement when combined with baseline clinical characteristics. Conclusions and Relevance: These findings suggest that a DL algorithm can accurately and independently predict IFTA from kidney ultrasonography images.
AB - Importance: Interstitial fibrosis and tubular atrophy (IFTA) is a strong indicator of decline in kidney function and is measured using histopathological assessment of kidney biopsy core. At present, a noninvasive test to assess IFTA is not available. Objective: To develop and validate a deep learning (DL) algorithm to quantify IFTA from kidney ultrasonography images. Design, Setting, and Participants: This was a single-center diagnostic study of consecutive patients who underwent native kidney biopsy at John H. Stroger Jr. Hospital of Cook County, Chicago, Illinois, between January 1, 2014, and December 31, 2018. A DL algorithm was trained, validated, and tested to classify IFTA from kidney ultrasonography images. Of 6135 Crimmins-filtered ultrasonography images, 5523 were used for training (5122 images) and validation (401 images), and 612 were used to test the accuracy of the DL system. Kidney segmentation was performed using the UNet architecture, and classification was performed using a convolution neural network-based feature extractor and extreme gradient boosting. IFTA scored by a nephropathologist on trichrome stained kidney biopsy slide was used as the reference standard. IFTA was divided into 4 grades (grade 1, 0%-24%; grade 2, 25%-49%; grade 3, 50%-74%; and grade 4, 75%-100%). Data analysis was performed from December 2019 to May 2020. Main Outcomes and Measures: Prediction of IFTA grade was measured using the metrics precision, recall, accuracy, and F1 score. Results: This study included 352 patients (mean [SD] age 47.43 [14.37] years), of whom 193 (54.82%) were women. There were 159 patients with IFTA grade 1 (2701 ultrasonography images), 74 patients with IFTA grade 2 (1239 ultrasonography images), 41 patients with IFTA grade 3 (701 ultrasonography images), and 78 patients with IFTA grade 4 (1494 ultrasonography images). Kidney ultrasonography images were segmented with 91% accuracy. In the independent test set, the point estimates for performance matrices showed precision of 0.8927 (95% CI, 0.8682-0.9172), recall of 0.8037 (95% CI, 0.7722-0.8352), accuracy of 0.8675 (95% CI, 0.8406-0.8944), and an F1 score of 0.8389 (95% CI, 0.8098-0.8680) at the image level. Corresponding estimates at the patient level were precision of 0.9003 (95% CI, 0.8644-0.9362), recall of 0.8421 (95% CI, 0.7984-0.8858), accuracy of 0.8955 (95% CI, 0.8589-0.9321), and an F1 score of 0.8639 (95% CI, 0.8228-0.9049). Accuracy at the patient level was highest for IFTA grade 1 and IFTA grade 4. The accuracy (approximately 90%) remained high irrespective of the timing of ultrasonography studies and the biopsy diagnosis. The predictive performance of the DL system did not show significant improvement when combined with baseline clinical characteristics. Conclusions and Relevance: These findings suggest that a DL algorithm can accurately and independently predict IFTA from kidney ultrasonography images.
UR - http://www.scopus.com/inward/record.url?scp=85106905491&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85106905491&partnerID=8YFLogxK
U2 - 10.1001/jamanetworkopen.2021.11176
DO - 10.1001/jamanetworkopen.2021.11176
M3 - Article
C2 - 34028548
AN - SCOPUS:85106905491
SN - 2574-3805
JO - JAMA Network Open
JF - JAMA Network Open
M1 - 11176
ER -