TY - JOUR
T1 - Using Machine Learning to Identify Metabolomic Signatures of Pediatric Chronic Kidney Disease Etiology
AU - for the CKD Biomarkers Consortium
AU - Lee, Arthur M.
AU - Hu, Jian
AU - Xu, Yunwen
AU - Abraham, Alison G.
AU - Xiao, Rui
AU - Coresh, Josef
AU - Rebholz, Casey
AU - Chen, Jingsha
AU - Rhee, Eugene P.
AU - Feldman, Harold I.
AU - Ramachandran, Vasan S.
AU - Kimmel, Paul L.
AU - Warady, Bradley A.
AU - Furth, Susan L.
AU - Denburg, Michelle R.
N1 - Funding Information:
This work was supported by Foundation for the National Institutes of Health grant U01DK106982 (CKD Biomarkers Consortium) and National Institute of Diabetes and Digestive and Kidney Diseases grant P50DK114786 (Children’s Hospital of Philadelphia Pediatric Center of Excellence in Nephrology).
Publisher Copyright:
ß 2022 by the American Society of Nephrology
PY - 2022/2
Y1 - 2022/2
N2 - Background Untargeted plasma metabolomic profiling combined with machine learning (ML) may lead to discovery of metabolic profiles that inform our understanding of pediatric CKD causes. We sought to identify metabolomic signatures in pediatric CKD based on diagnosis: FSGS, obstructive uropathy (OU), aplasia/dysplasia/hypoplasia (A/D/H), and reflux nephropathy (RN). Methods Untargeted metabolomic quantification (GC-MS/LC-MS, Metabolon) was performed on plasma from 702 Chronic Kidney Disease in Children study participants (n: FSGS563, OU5122, A/D/H5109, and RN586). Lasso regression was used for feature selection, adjusting for clinical covariates. Four methods were then applied to stratify significance: logistic regression, support vector machine, random forest, and extreme gradient boosting. ML training was performed on 80% total cohort subsets and validated on 20% holdout subsets. Important features were selected based on being significant in at least two of the four modeling approaches. We additionally performed pathway enrichment analysis to identify metabolic subpathways associated with CKD cause. Results ML models were evaluated on holdout subsets with receiver-operator and precision-recall area-under-the-curve, F1 score, and Matthews correlation coefficient. ML models outperformed no-skill prediction. Metabolomic profiles were identified based on cause. FSGS was associated with the sphingomyelin-ceramide axis. FSGS was also associated with individual plasmalogen metabolites and the subpathway. OU was associated with gut microbiome–derived histidine metabolites. Conclusion ML models identified metabolomic signatures based on CKD cause. Using ML techniques in conjunction with traditional biostatistics, we demonstrated that sphingomyelin-ceramide and plasmalogen dysmetabolism are associated with FSGS and that gut microbiome–derived histidine metabolites are associated with OU.
AB - Background Untargeted plasma metabolomic profiling combined with machine learning (ML) may lead to discovery of metabolic profiles that inform our understanding of pediatric CKD causes. We sought to identify metabolomic signatures in pediatric CKD based on diagnosis: FSGS, obstructive uropathy (OU), aplasia/dysplasia/hypoplasia (A/D/H), and reflux nephropathy (RN). Methods Untargeted metabolomic quantification (GC-MS/LC-MS, Metabolon) was performed on plasma from 702 Chronic Kidney Disease in Children study participants (n: FSGS563, OU5122, A/D/H5109, and RN586). Lasso regression was used for feature selection, adjusting for clinical covariates. Four methods were then applied to stratify significance: logistic regression, support vector machine, random forest, and extreme gradient boosting. ML training was performed on 80% total cohort subsets and validated on 20% holdout subsets. Important features were selected based on being significant in at least two of the four modeling approaches. We additionally performed pathway enrichment analysis to identify metabolic subpathways associated with CKD cause. Results ML models were evaluated on holdout subsets with receiver-operator and precision-recall area-under-the-curve, F1 score, and Matthews correlation coefficient. ML models outperformed no-skill prediction. Metabolomic profiles were identified based on cause. FSGS was associated with the sphingomyelin-ceramide axis. FSGS was also associated with individual plasmalogen metabolites and the subpathway. OU was associated with gut microbiome–derived histidine metabolites. Conclusion ML models identified metabolomic signatures based on CKD cause. Using ML techniques in conjunction with traditional biostatistics, we demonstrated that sphingomyelin-ceramide and plasmalogen dysmetabolism are associated with FSGS and that gut microbiome–derived histidine metabolites are associated with OU.
UR - http://www.scopus.com/inward/record.url?scp=85123968919&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123968919&partnerID=8YFLogxK
U2 - 10.1681/ASN.2021040538
DO - 10.1681/ASN.2021040538
M3 - Article
C2 - 35017168
AN - SCOPUS:85123968919
SN - 1046-6673
VL - 33
SP - 375
EP - 386
JO - Journal of the American Society of Nephrology : JASN
JF - Journal of the American Society of Nephrology : JASN
IS - 2
ER -