TY - GEN
T1 - An architecture for a data-intensive computer
AU - Givelberg, Edward
AU - Szalay, Alexander
AU - Kanov, Kalin
AU - Burns, Randal
PY - 2011
Y1 - 2011
N2 - Scientific instruments, as well as simulations, generate increasingly large datasets, changing the way we do science. We propose a system that we call the data-intensive computer for computing with Petascale-sized datasets. The dataintensive computer consists of an HPC cluster, a massively parallel database and a set of computing servers running the data-intensive operating system, which turns the database into a layer in the memory hierarchy of the data-intensive computer. The data-intensive operating system is dataobject-oriented: the abstract programming model of a sequential file, central to traditional computer operating systems, is replaced with system-level support for high-level data objects, such as multi-dimensional arrays, graphs, sparse arrays, etc. User application programs will be compiled into code that is executed both on the HPC cluster and inside the database. The data-intensive operating system is however non-local, allowing remote applications to execute code inside the database. This model supports the collaborative environment, where a large data set is typically created and processed by a large group of users. We are developing a software library, MPI-DB, which is a prototype of the data-intensive operating system. It is currently being used by the Turbulence group at JHU to store simulation output in the database and to perform simulations refining previously stored results.
AB - Scientific instruments, as well as simulations, generate increasingly large datasets, changing the way we do science. We propose a system that we call the data-intensive computer for computing with Petascale-sized datasets. The dataintensive computer consists of an HPC cluster, a massively parallel database and a set of computing servers running the data-intensive operating system, which turns the database into a layer in the memory hierarchy of the data-intensive computer. The data-intensive operating system is dataobject-oriented: the abstract programming model of a sequential file, central to traditional computer operating systems, is replaced with system-level support for high-level data objects, such as multi-dimensional arrays, graphs, sparse arrays, etc. User application programs will be compiled into code that is executed both on the HPC cluster and inside the database. The data-intensive operating system is however non-local, allowing remote applications to execute code inside the database. This model supports the collaborative environment, where a large data set is typically created and processed by a large group of users. We are developing a software library, MPI-DB, which is a prototype of the data-intensive operating system. It is currently being used by the Turbulence group at JHU to store simulation output in the database and to perform simulations refining previously stored results.
KW - Design
UR - http://www.scopus.com/inward/record.url?scp=84857989705&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84857989705&partnerID=8YFLogxK
U2 - 10.1145/2110217.2110226
DO - 10.1145/2110217.2110226
M3 - Conference contribution
AN - SCOPUS:84857989705
SN - 9781450311328
T3 - NDM'11 - Proceedings of the 2011 International Workshop on Network-Aware Data Management, Co-located with SC'11
SP - 57
EP - 64
BT - NDM'11 - Proceedings of the 2011 International Workshop on Network-Aware Data Management, Co-located with SC'11
T2 - International Workshop on Network-Aware Data Management, NDM'11, Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC'11
Y2 - 14 November 2011 through 14 November 2011
ER -