The overcrowding of the wireless space has triggered a strict competition for scare network resources. Therefore, there is a need for a dynamic spectrum access (DSA) technique that will ensure fair allocation of the available network resources for diverse network elements competing for the network resources. Spectrum handoff (SH) is a DSA technique through which cognitive radio (CR) promises to provide effective channel utilization, fair resource allocation, as well as reliable and uninterrupted real-time connection. However, SH may consume extra network resources, increase latency, and degrade network performance if the spectrum sensing technique used is ineffective and the channel selection strategy (CSS) is poorly implemented. Therefore, it is necessary to develop an SH policy that holistically considers the implementation of effective CSS, and spectrum sensing technique, as well as minimizes communication delays. In this work, two reinforcement learning (RL) algorithms are integrated into the CSS to perform channel selection. The first algorithm is used to evaluate the channel future occupancy, whereas the second algorithm is used to determine the channel quality in order to sort and rank the channels in candidate channel list (CCL). A method of masking linearly dependent and useless state elements is implemented to improve the convergence of the learning. Our approach showed a significant reduction in terms of latency and a remarkable improvement in throughput performance in comparison to conventional approaches.