TY - JOUR
T1 - Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space
AU - AnVIL Team
AU - Schatz, Michael C.
AU - Philippakis, Anthony A.
AU - Afgan, Enis
AU - Banks, Eric
AU - Carey, Vincent J.
AU - Carroll, Robert J.
AU - Culotti, Alessandro
AU - Ellrott, Kyle
AU - Goecks, Jeremy
AU - Grossman, Robert L.
AU - Hall, Ira M.
AU - Hansen, Kasper D.
AU - Lawson, Jonathan
AU - Leek, Jeffrey T.
AU - Luria, Anne O.Donnell
AU - Mosher, Stephen
AU - Morgan, Martin
AU - Nekrutenko, Anton
AU - O'Connor, Brian D.
AU - Osborn, Kevin
AU - Paten, Benedict
AU - Patterson, Candace
AU - Tan, Frederick J.
AU - Taylor, Casey Overby
AU - Vessio, Jennifer
AU - Waldron, Levi
AU - Wang, Ting
AU - Wuichet, Kristin
AU - Baumann, Alexander
AU - Rula, Andrew
AU - Kovalsy, Anton
AU - Bernard, Clare
AU - Caetano-Anollés, Derek
AU - Van der Auwera, Geraldine A.
AU - Canas, Justin
AU - Yuksel, Kaan
AU - Herman, Kate
AU - Taylor, M. Morgan
AU - Simeon, Marianie
AU - Baumann, Michael
AU - Wang, Qi
AU - Title, Robert
AU - Munshi, Ruchi
AU - Chaluvadi, Sushma
AU - Reeves, Valerie
AU - Disman, William
AU - Thomas, Salin
AU - Hajian, Allie
AU - Wheelan, Sarah J.
AU - Kammers, Kai
N1 - Funding Information:
This work is dedicated to the late James Peter Taylor, the Ralph S. O’Connor Professor of Biology and Computer Science at Johns Hopkins University, who was one of the original architects for the AnVIL and an ardent champion for open science ( https://galaxyproject.org/jxtx ). V.D.F., E.M.G., C.H., N.K., S.K.S., A.S., C.W., and K.L.W. provided substantial involvement and guidance for the project activities and contributed to the manuscript in their official roles as program coordinators for the NIH, NHGRI. The AnVIL is supported through cooperative agreement awards from NHGRI with co-funding from OD/ODSS to the Broad Institute ( U24HG010262 ) and Johns Hopkins University ( U24HG010263 ). The GDSCN is supported through a contract to Johns Hopkins University ( 75N92020P00235 ).
Funding Information:
This work is dedicated to the late James Peter Taylor, the Ralph S. O'Connor Professor of Biology and Computer Science at Johns Hopkins University, who was one of the original architects for the AnVIL and an ardent champion for open science (https://galaxyproject.org/jxtx). V.D.F. E.M.G. C.H. N.K. S.K.S. A.S. C.W. and K.L.W. provided substantial involvement and guidance for the project activities and contributed to the manuscript in their official roles as program coordinators for the NIH, NHGRI. The AnVIL is supported through cooperative agreement awards from NHGRI with co-funding from OD/ODSS to the Broad Institute (U24HG010262) and Johns Hopkins University (U24HG010263). The GDSCN is supported through a contract to Johns Hopkins University (75N92020P00235). A.A.P. is a venture partner at GV and has received funding from Intel, IBM, Microsoft, Alphabet, and Bayer. D.B. E.A. J.G. J.C. and A.N. are founders of and hold equity in GalaxyWorks, LLC. The results of the study discussed in this publication could affect the value of GalaxyWorks, LLC. These arrangements have been reviewed and approved by the Johns Hopkins University, Oregon Health & Science University, and The Pennsylvania State University in accordance with their respective conflict of interest policies. V.C. has financial interest in Amazon, NVIDIA, and AMD.
Publisher Copyright:
© 2021 The Author(s)
PY - 2022/1/12
Y1 - 2022/1/12
N2 - The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) was developed to address a widespread community need for a unified computing environment for genomics data storage, management, and analysis. In this perspective, we present AnVIL, describe its ecosystem and interoperability with other platforms, and highlight how this platform and associated initiatives contribute to improved genomic data sharing efforts. The AnVIL is a federated cloud platform designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. By inverting the traditional model of data sharing, the AnVIL eliminates the need for data movement while also adding security measures for active threat detection and monitoring and provides scalable, shared computing resources for any researcher. We describe the core data management and analysis components of the AnVIL, which currently consists of Terra, Gen3, Galaxy, RStudio/Bioconductor, Dockstore, and Jupyter, and describe several flagship genomics datasets available within the AnVIL. We continue to extend and innovate the AnVIL ecosystem by implementing new capabilities, including mechanisms for interoperability and responsible data sharing, while streamlining access management. The AnVIL opens many new opportunities for analysis, collaboration, and data sharing that are needed to drive research and to make discoveries through the joint analysis of hundreds of thousands to millions of genomes along with associated clinical and molecular data types.
AB - The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) was developed to address a widespread community need for a unified computing environment for genomics data storage, management, and analysis. In this perspective, we present AnVIL, describe its ecosystem and interoperability with other platforms, and highlight how this platform and associated initiatives contribute to improved genomic data sharing efforts. The AnVIL is a federated cloud platform designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. By inverting the traditional model of data sharing, the AnVIL eliminates the need for data movement while also adding security measures for active threat detection and monitoring and provides scalable, shared computing resources for any researcher. We describe the core data management and analysis components of the AnVIL, which currently consists of Terra, Gen3, Galaxy, RStudio/Bioconductor, Dockstore, and Jupyter, and describe several flagship genomics datasets available within the AnVIL. We continue to extend and innovate the AnVIL ecosystem by implementing new capabilities, including mechanisms for interoperability and responsible data sharing, while streamlining access management. The AnVIL opens many new opportunities for analysis, collaboration, and data sharing that are needed to drive research and to make discoveries through the joint analysis of hundreds of thousands to millions of genomes along with associated clinical and molecular data types.
UR - http://www.scopus.com/inward/record.url?scp=85127480763&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127480763&partnerID=8YFLogxK
U2 - 10.1016/j.xgen.2021.100085
DO - 10.1016/j.xgen.2021.100085
M3 - Review article
C2 - 35199087
AN - SCOPUS:85127480763
SN - 2666-979X
VL - 2
JO - Cell Genomics
JF - Cell Genomics
IS - 1
M1 - 100085
ER -