Pipe switch: Fast pipelined context switching for deep learning applications

Zhihao Bai, Zhen Zhang, Yibo Zhu, Xin Jin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deep learning (DL) workloads include throughput-intensive training tasks and latency-sensitive inference tasks. The dominant practice today is to provision dedicated GPU clusters for training and inference separately. Due to the need to meet strict Service-Level Objectives (SLOs), GPU clusters are often over-provisioned based on the peak load with limited sharing between applications and task types. We present PipeSwitch, a system that enables unused cycles of an inference application to be filled by training or other inference applications. It allows multiple DL applications to time-share the same GPU with the entire GPU memory and millisecond-scale switching overhead. With PipeSwitch, GPU utilization can be significantly improved without sacrificing SLOs. We achieve so by introducing pipelined context switching. The key idea is to leverage the layered structure of neural network models and their layer-by-layer computation pattern to pipeline model transmission over the PCIe and task execution in the GPU with model-aware grouping. We also design unified memory management and active-standby worker switching mechanisms to accompany the pipelining and ensure process-level isolation. We have built a PipeSwitch prototype and integrated it with PyTorch. Experiments on a variety of DL models and GPU cards show that PipeSwitch only incurs a task startup overhead of 3.6-6.6 ms and a total overhead of 5.4-34.6 ms (10-50× better than NVIDIA MPS), and achieves near 100% GPU utilization.

Original languageEnglish (US)
Title of host publicationProceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020
PublisherUSENIX Association
Pages499-514
Number of pages16
ISBN (Electronic)9781939133199
StatePublished - 2020
Externally publishedYes
Event14th USENIX Symposium on Operating Systems Design and Implementation,OSDI 2020 - Virtual, Online
Duration: Nov 4 2020Nov 6 2020

Publication series

NameProceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020

Conference

Conference14th USENIX Symposium on Operating Systems Design and Implementation,OSDI 2020
CityVirtual, Online
Period11/4/2011/6/20

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems

Fingerprint

Dive into the research topics of 'Pipe switch: Fast pipelined context switching for deep learning applications'. Together they form a unique fingerprint.

Cite this