TY - GEN
T1 - SkyQuery
T2 - 24th International Conference on Scientific and Statistical DatabaseManagement, SSDBM 2012
AU - Dobos, László
AU - Budavári, Tamás
AU - Li, Nolan
AU - Szalay, Alexander S.
AU - Csabai, István
PY - 2012
Y1 - 2012
N2 - Multi-wavelength astronomical studies require cross-identification of detections of the same celestial objects in multiple catalogs based on spherical coordinates and other properties. Because of the large data volumes and spherical geometry, the symmetric N-way association of astronomical detections is a computationally intensive problem, even when sophisticated indexing schemes are used to exclude obviously false candidates. Legacy astronomical catalogs already contain detections of more than a hundred million objects while ongoing and future surveys will produce catalogs of billions of objects with multiple detections of each at different times. One time, pair-wise cross-identification of these large catalogs is not sufficient for many astronomical scenarios. Consequently, a novel system is necessary that can cross-identify multiple catalogs on-demand, efficiently and reliably. In this paper, we present our solution based on a cluster of commodity servers and ordinary relational databases. The cross-identification problems are formulated in a language based on SQL, but extended with special clauses. These special queries are partitioned spatially by coordinate ranges and compiled into a complex workflow of ordinary SQL queries. Workflows are then executed in a parallel framework using a cluster of servers hosting identical mirrors of the same data sets.
AB - Multi-wavelength astronomical studies require cross-identification of detections of the same celestial objects in multiple catalogs based on spherical coordinates and other properties. Because of the large data volumes and spherical geometry, the symmetric N-way association of astronomical detections is a computationally intensive problem, even when sophisticated indexing schemes are used to exclude obviously false candidates. Legacy astronomical catalogs already contain detections of more than a hundred million objects while ongoing and future surveys will produce catalogs of billions of objects with multiple detections of each at different times. One time, pair-wise cross-identification of these large catalogs is not sufficient for many astronomical scenarios. Consequently, a novel system is necessary that can cross-identify multiple catalogs on-demand, efficiently and reliably. In this paper, we present our solution based on a cluster of commodity servers and ordinary relational databases. The cross-identification problems are formulated in a language based on SQL, but extended with special clauses. These special queries are partitioned spatially by coordinate ranges and compiled into a complex workflow of ordinary SQL queries. Workflows are then executed in a parallel framework using a cluster of servers hosting identical mirrors of the same data sets.
KW - astronomical catalogs
KW - computational statistics
KW - probabilistic join
KW - query optimization and languages
KW - workflow
UR - http://www.scopus.com/inward/record.url?scp=84863433955&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863433955&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-31235-9_10
DO - 10.1007/978-3-642-31235-9_10
M3 - Conference contribution
AN - SCOPUS:84863433955
SN - 9783642312342
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 159
EP - 167
BT - Scientific and Statistical Database Management - 24th International Conference, SSDBM 2012, Proceedings
Y2 - 25 June 2012 through 27 June 2012
ER -