An architecture for a data-intensive computer

Edward Givelberg, Alexander Szalay, Kalin Kanov, Randal Burns

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Scientific instruments, as well as simulations, generate increasingly large datasets, changing the way we do science. We propose a system that we call the data-intensive computer for computing with Petascale-sized datasets. The dataintensive computer consists of an HPC cluster, a massively parallel database and a set of computing servers running the data-intensive operating system, which turns the database into a layer in the memory hierarchy of the data-intensive computer. The data-intensive operating system is dataobject-oriented: the abstract programming model of a sequential file, central to traditional computer operating systems, is replaced with system-level support for high-level data objects, such as multi-dimensional arrays, graphs, sparse arrays, etc. User application programs will be compiled into code that is executed both on the HPC cluster and inside the database. The data-intensive operating system is however non-local, allowing remote applications to execute code inside the database. This model supports the collaborative environment, where a large data set is typically created and processed by a large group of users. We are developing a software library, MPI-DB, which is a prototype of the data-intensive operating system. It is currently being used by the Turbulence group at JHU to store simulation output in the database and to perform simulations refining previously stored results.

Original languageEnglish (US)
Title of host publicationNDM'11 - Proceedings of the 2011 International Workshop on Network-Aware Data Management, Co-located with SC'11
Pages57-64
Number of pages8
DOIs
StatePublished - 2011
EventInternational Workshop on Network-Aware Data Management, NDM'11, Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC'11 - Seattle, WA, United States
Duration: Nov 14 2011Nov 14 2011

Publication series

NameNDM'11 - Proceedings of the 2011 International Workshop on Network-Aware Data Management, Co-located with SC'11

Conference

ConferenceInternational Workshop on Network-Aware Data Management, NDM'11, Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC'11
Country/TerritoryUnited States
CitySeattle, WA
Period11/14/1111/14/11

Keywords

  • Design

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Fingerprint

Dive into the research topics of 'An architecture for a data-intensive computer'. Together they form a unique fingerprint.

Cite this