Improving robotic milling efficiency enhances productivity and reduces costs. While feed rate and spindle speed critically influence efficiency, chatter instability complicates their optimization. Existing stability constraints ignore state-dependent regenerative mechanisms, while the nonlinear effects of feed further complicate optimization. Although adjusting the robotic configuration improves stability, operational constraints such as joint singularity should be considered. This work proposes a reinforcement learning (RL) method to jointly optimize feed rate, spindle speed, and robotic configuration. RL dynamically maximizes efficiency in high-dimensional space using a reward function integrating stability and operability. Simulation results validate the method's superior performance.