Abstract
Position emission tomography (PET) is widely used in clinics and research due to its quantitative merits and high sensitivity, but suffers from low signal-to-noise ratio (SNR). Recently convolutional neural networks (CNNs) have been widely used to improve PET image quality. Though successful and efficient in local feature extraction, CNN cannot capture long-range dependencies well due to its limited receptive field. Global multi-head self-attention (MSA) is a popular approach to capture long-range information. However, the calculation of global MSA for 3D images has high computational costs. In this work, we proposed an efficient spatial and channel-wise encoder-decoder transformer, Spach Transformer, that can leverage spatial and channel information based on local and global MSAs. Experiments based on datasets of different PET tracers, i.e., 18F-FDG, 18F-ACBC, 18F-DCFPyL, and 68Ga-DOTATATE, were conducted to evaluate the proposed framework. Quantitative results show that the proposed Spach Transformer framework outperforms state-of-the-art deep learning architectures.
Original language | English (US) |
---|---|
Pages (from-to) | 2036-2049 |
Number of pages | 14 |
Journal | IEEE transactions on medical imaging |
Volume | 43 |
Issue number | 6 |
DOIs | |
State | Published - Jun 1 2024 |
Keywords
- Positron emission tomography
- image denoising
- local and global self-attention
- low-dose PET
- spatial and channel-wise transformer
ASJC Scopus subject areas
- Software
- Radiological and Ultrasound Technology
- Electrical and Electronic Engineering
- Computer Science Applications