With the rapid development of machine learning, deep reinforcement learning (DRL) algorithms have recently been widely used for energy management in hybrid electric urban buses (HEUBs). However, the current DRL-based strategies suffer from insufficient constraint capability, slow learning speed, and unstable convergence. In this study, a state-of-the-art continuous control DRL algorithm, softmax deep double deterministic policy gradients (SD3), is used to develop the energy management system of a power-split HEUB. In particular, an action masking (AM) technique that does not alter the SD3′s underlying principles is proposed to prevent the SD3-based strategy from outputting invalid actions that violate the system’s physical constraints. Additionally, the transfer learning (TL) method of the SD3-based strategy is explored to avoid repetitive training of neural networks in different driving cycles. The results demonstrate that the learning performance and learning stability of SD3 are unaffected by AM and that SD3 with AM achieves control performance that is highly comparable to dynamic planning for both the CHTC-B and WVUCITY driving cycles. Aside from that, TL contributes to the rapid development of SD3. TL can speed up SD3’s convergence by at least 67.61% without significantly affecting fuel economy.