TY - JOUR
T1 - The absence of longitudinal data limits the accuracy of high-throughput clinical phenotyping for identifying type 2 diabetes mellitus subjects
AU - Wei, Wei Qi
AU - Leibson, Cynthia L.
AU - Ransom, Jeanine E.
AU - Kho, Abel N.
AU - Chute, Christopher G.
N1 - Funding Information:
This study was supported by the Biomedical Informatics and Computational Biology Graduate Traineeship Program, University of Minnesota; the eMERGE project, NIH U01 HG04599; and the Strategic Health IT Advanced Research Projects program, #90TR0002-01Z-02. We wish to acknowledge fruitful discussions with Dr. High Seng Chai and Dr. Pedro Caraballo.
PY - 2013/4
Y1 - 2013/4
N2 - Purpose: To evaluate the impact of insufficient longitudinal data on the accuracy of a high-throughput clinical phenotyping (HTCP) algorithm for identifying (1) patients with type 2 diabetes mellitus (T2DM) and (2) patients with no diabetes. Methods: Retrospective study conducted at Mayo Clinic in Rochester, Minnesota. Eligible subjects were Olmsted County residents with ≥1 Mayo Clinic encounter in each of three time periods: (1) 2007, (2) from 1997 through 2006, and (3) before 1997 (N= 54,283). Diabetes relevant electronic medical record (EMR) data about diagnoses, laboratories, and medications were used. We employed the HTCP algorithm to categorize individuals as T2DM cases and non-diabetes controls. Considering the full 11. years (1997-2007) as the gold standard, we compared gold-standard categorizations with those using data for 10 subsequent intervals, ranging from 1998-2007 (10-year data) to 2007 (1-year data). Positive predictive values (PPVs) and false-negative rates (FNRs) were calculated. McNemar tests were used to determine whether categorizations using shorter time periods differed from the gold standard. Statistical significance was defined as P< 0.05. Results: We identified 2770 T2DM cases and 21,005 controls when the algorithm was applied using 11-year data. Using 2007 data alone, PPVs and FNRs, respectively, were 70% and 25% for case identification and 59% and 67% for control identification. All time frames differed significantly from the gold standard, except for the 10-year period. Conclusions: The accuracy of the algorithm reduced remarkably as data were limited to shorter observation periods. This impact should be considered carefully when designing/executing HTCP algorithms.
AB - Purpose: To evaluate the impact of insufficient longitudinal data on the accuracy of a high-throughput clinical phenotyping (HTCP) algorithm for identifying (1) patients with type 2 diabetes mellitus (T2DM) and (2) patients with no diabetes. Methods: Retrospective study conducted at Mayo Clinic in Rochester, Minnesota. Eligible subjects were Olmsted County residents with ≥1 Mayo Clinic encounter in each of three time periods: (1) 2007, (2) from 1997 through 2006, and (3) before 1997 (N= 54,283). Diabetes relevant electronic medical record (EMR) data about diagnoses, laboratories, and medications were used. We employed the HTCP algorithm to categorize individuals as T2DM cases and non-diabetes controls. Considering the full 11. years (1997-2007) as the gold standard, we compared gold-standard categorizations with those using data for 10 subsequent intervals, ranging from 1998-2007 (10-year data) to 2007 (1-year data). Positive predictive values (PPVs) and false-negative rates (FNRs) were calculated. McNemar tests were used to determine whether categorizations using shorter time periods differed from the gold standard. Statistical significance was defined as P< 0.05. Results: We identified 2770 T2DM cases and 21,005 controls when the algorithm was applied using 11-year data. Using 2007 data alone, PPVs and FNRs, respectively, were 70% and 25% for case identification and 59% and 67% for control identification. All time frames differed significantly from the gold standard, except for the 10-year period. Conclusions: The accuracy of the algorithm reduced remarkably as data were limited to shorter observation periods. This impact should be considered carefully when designing/executing HTCP algorithms.
KW - Data aggregation
KW - Diabetes mellitus
KW - Electronic medical record
KW - Medical informatics
KW - Phenotype
KW - Research subject selection
UR - http://www.scopus.com/inward/record.url?scp=84875375245&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84875375245&partnerID=8YFLogxK
U2 - 10.1016/j.ijmedinf.2012.05.015
DO - 10.1016/j.ijmedinf.2012.05.015
M3 - Article
C2 - 22762862
AN - SCOPUS:84875375245
SN - 1386-5056
VL - 82
SP - 239
EP - 247
JO - International Journal of Medical Informatics
JF - International Journal of Medical Informatics
IS - 4
ER -