Building extraction by deep learning from remote sensing images is currently a research hotspot. PSPNet is one of the classic semantic segmentation models and is currently adopted by many applications. Moreover, PSPNet can use not only CNN-based networks but also transformer-based networks as backbones; therefore, PSPNet also has high value in the transformer era. The core of PSPNet is the pyramid pooling module, which gives PSPNet the ability to capture the local features of different scales. However, the pyramid pooling module also has obvious shortcomings. The grid is fixed, and the pixels close to the edge of the grid cannot obtain the entire local features. To address this issue, an improved PSPNet network architecture named shift pooling PSPNet is proposed, which uses a module called shift pyramid pooling to replace the original pyramid pooling module, so that the pixels at the edge of the grid can also obtain the entire local features. Shift pooling is not only useful for PSPNet but also in any network that uses a fixed grid for downsampling to increase the receptive field and save computing, such as ResNet. A dense connection was adopted in decoding, and upsampling was gradually carried out. With two open datasets, the improved PSPNet, PSPNet, and some classic image segmentation models were used for comparative experiments. The results show that our method is the best according to the evaluation metrics, and the predicted image is closer to the label.