Treatment planning is an essential step in radiation therapy (RT). It requires solving a complex inverse optimization problem through time-consuming adjustment of a set of parameters to meet clinical objectives in a trial and error fashion. Fast and robust treatment plan optimization is important to achieve effective RT of cancer patients. We propose an artificial intelligence (AI)-based RT planning strategy that uses a deep-Q reinforcement learning (RL) to automatically optimize machine parameters by finding an optimal machine control policy. The network uses RT planning CT and contours as input, and dual deep-Q networks that are trained to control the dose rate and multi-leaf collimator positions based on the current dose distribution and machine parameter state. The Q-value is computed as the discounted cumulative cost based on dose objectives, and minimized by experience-replay RL to determine the policy. The proposed approach was applied to prostate cancer RT planning, and validated on 10 prostate cancer cases. Dose distributions generated by RL were compared to conformal arcs and clinical intensity modulated radiotherapy (IMRT) plans. RL was able to generate RT plan with comparable target and normal tissue dose to clinical plans with mean±SD doses of 83.1±1.7 Gy, 39.9±10.0 Gy, and 39.6±13.9 Gy for planning target volume (PTV), rectum, and bladder, respectively (vs 84.4±1.0 Gy, 41.8±15.0 Gy, and 50.6±11.4 Gy for clinical IMRT). This preliminary study demonstrates the potential of an RL approach to enable rapid AI-based RT plan optimization, significantly reducing time and burden required for RT planning.