Migrating a (large) science database to the cloud

Ani Thakar, Alex Szalay

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We report on attempts to put an existing scientific (astronomical) database - the Sloan Digital Sky Survey (SDSS) science archive [1] - in the cloud. Based on our experience, it is either very frustrating or impossible at this time to migrate an existing, complex SQL Server database into current cloud service offerings such as Amazon (EC2) and Microsoft (SQL Azure). Certainly it is impossible to migrate a large database in excess of a TB, but even with (much) smaller databases, the limitations of cloud services make it very difficult to migrate the data to the cloud without making changes to the schema and settings (for example, inability to migrate a spatial indexing library, and several other user-defined functions and stored procedures) that would invalidate performance comparisons between cloud and on-premise versions. So it is not surprising that our preliminary performance comparisons show a very large (an order of magnitude) performance discrepancy with the Amazon cloud version of the SDSS database. We have also not yet investigated the performance tweaks that could be possible within the cloud. Although we managed to successfully migrate (a subset of) the SDSS catalog database to Amazon EC2, we were not able to access the database in a meaningful way from the outside world. Even though this was advertised as a public dataset on the AWS blog, it was not clear how other users or the public would be able to access this data in a meaningful way, if at all. These difficulties suggest that much work and coordination needs to occur between cloud service providers and their potential database clients before science databases can successfully and effectively be deployed in the cloud. This is true not just for large scientific databases but all databases that make extensive use of advanced database management system (DBMS) features for performance and user convenience.

Original languageEnglish (US)
Title of host publicationHPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Pages430-434
Number of pages5
DOIs
StatePublished - 2010
Event19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010 - Chicago, IL, United States
Duration: Jun 21 2010Jun 25 2010

Publication series

NameHPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

Conference

Conference19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010
Country/TerritoryUnited States
CityChicago, IL
Period6/21/106/25/10

Keywords

  • Cloud
  • Data in the cloud. cloud computing
  • Databases
  • Scientific databases

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Migrating a (large) science database to the cloud'. Together they form a unique fingerprint.

Cite this