Purpose: To develop and evaluate a volumetric modulated arc therapy (VMAT) machine parameter optimization (MPO) approach based on deep-Q reinforcement learning (RL) capable of finding an optimal machine control policy using previous prostate cancer patient CT scans and contours, and applying the policy to new cases to rapidly produce deliverable VMAT plans in a simplified beam model. Methods: A convolutional deep-Q network was employed to control the dose rate and multileaf collimator of a C-arm linear accelerator model using the current dose distribution and machine parameter state as input. A Q-value was defined as the discounted cumulative cost based on dose objectives, and experience-replay RL was performed to determine a policy to minimize the Q-value. A two-dimensional network design was employed which optimized each opposing leaf pair independently while monitoring the corresponding dose plane blocked by those leaves. This RL approach was applied to CT and contours from 40 retrospective prostate cancer patients. The dataset was split into training (15 patients) and validation (5 patients) groups to optimize the network, and its performance was tested in an independent cohort of 20 patients by comparing RL-based dose distributions to conformal arcs and clinical intensity modulated radiotherapy (IMRT) delivering a prescription dose of 78 Gy in 40 fractions. Results: Mean ± SD execution time of the RL VMAT optimization was 1.5 ± 0.2 s per slice. In the test cohort, mean ± SD (P-value) planning target volume (PTV), bladder, and rectum dose were 80.5 ± 2.0 Gy (P < 0.001), 44.2 ± 14.6 Gy (P < 0.001), and 43.7 ± 11.1 Gy (P < 0.001) for RL VMAT compared to 81.6 ± 1.1 Gy, 51.6 ± 12.9 Gy, and 36.0 ± 12.3 Gy for clinical IMRT. Conclusions: RL was applied to VMAT MPO using clinical patient contours without independently optimized treatment plans for training and achieved comparable target and normal tissue dose to clinical plans despite the application of a relatively simple network design originally developed for video-game control. These results suggest that extending a RL approach to a full three-dimensional beam model could enable rapid artificial intelligence-based optimization of deliverable treatment plans, reducing the time required for radiotherapy planning without requiring previous plans for training.
- artificial intelligence
- deep-Q learning
- reinforcement learning
- treatment planning
- volumetric modulated arc therapy
ASJC Scopus subject areas
- Radiology Nuclear Medicine and imaging