TY - GEN
T1 - FastSAM3D
T2 - 27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024
AU - Shen, Yiqing
AU - Li, Jingxing
AU - Shao, Xinyuan
AU - Inigo Romillo, Blanca
AU - Jindal, Ankush
AU - Dreizin, David
AU - Unberath, Mathias
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - Segment anything models (SAMs) are gaining attention for their zero-shot generalization capability in segmenting objects of unseen classes and in unseen domains when properly prompted. Interactivity is a key strength of SAMs, allowing users to iteratively provide prompts that specify objects of interest to refine outputs. However, to realize the interactive use of SAMs for 3D medical imaging tasks, rapid inference times are necessary. High memory requirements and long processing delays remain constraints that hinder the adoption of SAMs for this purpose. Specifically, while 2D SAMs applied to 3D volumes contend with repetitive computation to process all slices independently, 3D SAMs suffer from an exponential increase in model parameters and FLOPS. To address these challenges, we present FastSAM3D which accelerates SAM inference to 8 milliseconds per 128×128×128 3D volumetric image on an NVIDIA A100 GPU. This speedup is accomplished through 1) a novel layer-wise progressive distillation scheme that enables knowledge transfer from a complex 12-layer ViT-B to a lightweight 6-layer ViT-Tiny variant encoder without training from scratch; and 2) a novel 3D sparse flash attention to replace vanilla attention operators, substantially reducing memory needs and improving parallelization. Experiments on three diverse datasets reveal that FastSAM3D achieves a remarkable speedup of 527.38× compared to 2D SAMs and 8.75× compared to 3D SAMs on the same volumes without significant performance decline. Thus, FastSAM3D opens the door for low-cost truly interactive SAM-based 3D medical imaging segmentation with commonly used GPU hardware. Code is available at https://github.com/arcadelab/FastSAM3D.
AB - Segment anything models (SAMs) are gaining attention for their zero-shot generalization capability in segmenting objects of unseen classes and in unseen domains when properly prompted. Interactivity is a key strength of SAMs, allowing users to iteratively provide prompts that specify objects of interest to refine outputs. However, to realize the interactive use of SAMs for 3D medical imaging tasks, rapid inference times are necessary. High memory requirements and long processing delays remain constraints that hinder the adoption of SAMs for this purpose. Specifically, while 2D SAMs applied to 3D volumes contend with repetitive computation to process all slices independently, 3D SAMs suffer from an exponential increase in model parameters and FLOPS. To address these challenges, we present FastSAM3D which accelerates SAM inference to 8 milliseconds per 128×128×128 3D volumetric image on an NVIDIA A100 GPU. This speedup is accomplished through 1) a novel layer-wise progressive distillation scheme that enables knowledge transfer from a complex 12-layer ViT-B to a lightweight 6-layer ViT-Tiny variant encoder without training from scratch; and 2) a novel 3D sparse flash attention to replace vanilla attention operators, substantially reducing memory needs and improving parallelization. Experiments on three diverse datasets reveal that FastSAM3D achieves a remarkable speedup of 527.38× compared to 2D SAMs and 8.75× compared to 3D SAMs on the same volumes without significant performance decline. Thus, FastSAM3D opens the door for low-cost truly interactive SAM-based 3D medical imaging segmentation with commonly used GPU hardware. Code is available at https://github.com/arcadelab/FastSAM3D.
KW - Foundation Model
KW - Interactive Segmentation
KW - Model Acceleration
KW - Segment Anything Model (SAM)
UR - http://www.scopus.com/inward/record.url?scp=85208168579&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85208168579&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-72390-2_51
DO - 10.1007/978-3-031-72390-2_51
M3 - Conference contribution
AN - SCOPUS:85208168579
SN - 9783031723896
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 542
EP - 552
BT - Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings
A2 - Linguraru, Marius George
A2 - Dou, Qi
A2 - Feragen, Aasa
A2 - Giannarou, Stamatia
A2 - Glocker, Ben
A2 - Lekadir, Karim
A2 - Schnabel, Julia A.
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 6 October 2024 through 10 October 2024
ER -