Just-in-time analytics on large file systems

H. Howie Huang, Nan Zhang, Wei Wang, Gautam Das, Alexander S. Szalay

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As file systems reach the petabytes scale, users and administrators are increasingly interested in acquiring high-level analytical information for file management and analysis. Two particularly important tasks are the processing of aggregate and top-k queries which, unfortunately, cannot be quickly answered by hierarchical file systems such as ext3 and NTFS. Existing pre-processing based solutions, e.g., file system crawling and index building, consume a significant amount of time and space (for generating and maintaining the indexes) which in many cases cannot be justified by the infrequent usage of such solutions. In this paper, we advocate that user interests can often be sufficiently satisfied by approximate - i.e., statistically accurate - answers. We develop Glance, a just-in-time sampling-based system which, after consuming a small number of disk accesses, is capable of producing extremely accurate answers for a broad class of aggregate and top-k queries over a file system without the requirement of any prior knowledge. We use a number of real-world file systems to demonstrate the efficiency, accuracy and scalability of Glance.

Original languageEnglish (US)
Title of host publicationProceedings of FAST 2011
Subtitle of host publication9th USENIX Conference on File and Storage Technologies
PublisherUSENIX Association
Pages217-230
Number of pages14
ISBN (Electronic)9781931971829
StatePublished - 2011
Event9th USENIX Conference on File and Storage Technologies, FAST 2011 - San Jose, United States
Duration: Feb 15 2011Feb 17 2011

Publication series

NameProceedings of FAST 2011: 9th USENIX Conference on File and Storage Technologies

Conference

Conference9th USENIX Conference on File and Storage Technologies, FAST 2011
Country/TerritoryUnited States
CitySan Jose
Period2/15/112/17/11

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Just-in-time analytics on large file systems'. Together they form a unique fingerprint.

Cite this