TY - JOUR
T1 - Mining the human phenome using semantic web technologies
T2 - a case study for Type 2 Diabetes.
AU - Pathak, Jyotishman
AU - Kiefer, Richard C.
AU - Bielinski, Suzette J.
AU - Chute, Christopher G.
PY - 2012
Y1 - 2012
N2 - The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form "biobanks" where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypothesis generation. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped with Type 2 Diabetes for discovering gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries.
AB - The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form "biobanks" where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypothesis generation. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped with Type 2 Diabetes for discovering gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries.
UR - http://www.scopus.com/inward/record.url?scp=84880798801&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84880798801&partnerID=8YFLogxK
M3 - Article
C2 - 23304343
AN - SCOPUS:84880798801
SN - 0309-1708
VL - 2012
SP - 699
EP - 708
JO - Advances in Water Resources
JF - Advances in Water Resources
ER -