TY - GEN
T1 - JAWS
T2 - 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010
AU - Wang, Xiaodan
AU - Perlman, Eric
AU - Burns, Randal
AU - Malik, Tanu
AU - Budavári, Tamas
AU - Meneveau, Charles
AU - Szalay, Alexander
PY - 2010
Y1 - 2010
N2 - We present JAWS, a job-aware, data-driven batch scheduler that improves query throughput for data-intensive scientific database clusters. As datasets reach petabyte-scale, workloads that scan through vast amounts of data to extract features are gaining importance in the sciences. However, acute performance bottlenecks result when multiple queries execute simultaneously and compete for I/O resources. Our solution, JAWS, divides queries into I/O-friendly sub-queries for scheduling. It then identifies overlapping data requirements within the workload and executes sub-queries in batches to maximize data sharing and reduce redundant I/O. JAWS extends our previous work [1] by supporting workflows in which queries exhibit data dependencies, exploiting workload knowledge to coordinate caching decisions, and combating starvation through adaptive and incremental trade-offs between query throughput and response time. Instrumenting JAWS in the Turbulence Database Cluster [2] yields nearly three-fold improvement in query throughput when contention in the workload is high.
AB - We present JAWS, a job-aware, data-driven batch scheduler that improves query throughput for data-intensive scientific database clusters. As datasets reach petabyte-scale, workloads that scan through vast amounts of data to extract features are gaining importance in the sciences. However, acute performance bottlenecks result when multiple queries execute simultaneously and compete for I/O resources. Our solution, JAWS, divides queries into I/O-friendly sub-queries for scheduling. It then identifies overlapping data requirements within the workload and executes sub-queries in batches to maximize data sharing and reduce redundant I/O. JAWS extends our previous work [1] by supporting workflows in which queries exhibit data dependencies, exploiting workload knowledge to coordinate caching decisions, and combating starvation through adaptive and incremental trade-offs between query throughput and response time. Instrumenting JAWS in the Turbulence Database Cluster [2] yields nearly three-fold improvement in query throughput when contention in the workload is high.
UR - http://www.scopus.com/inward/record.url?scp=78650849510&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78650849510&partnerID=8YFLogxK
U2 - 10.1109/SC.2010.31
DO - 10.1109/SC.2010.31
M3 - Conference contribution
AN - SCOPUS:78650849510
SN - 9781424475575
T3 - 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010
BT - 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010
Y2 - 13 November 2010 through 19 November 2010
ER -