Hyperspectral unmixing, which decomposes mixed pixels into the endmembers and corresponding abundances, is an important image process for the further application of hyperspectral images (HSIs). Lately, the unmixing problem has been solved using deep learning techniques, particularly autoencoders (AEs). However, the majority of them are based on the simple linear mixing model (LMM), which disregards the spectral variability of endmembers in different pixels. In this article, we present a multi-attention AE network (MAAENet) based on the extended LMM to address the issue of the spectral variability problem in real scenes. Moreover, the majority of AE networks ignore the global spatial information in HSIs and operate pixel- or patch-wise. We employ attention mechanisms to design a spatial–spectral attention (SSA) module that can deal with the band redundancy in HSIs and extract global spatial features through spectral correlation. Moreover, noticing that the mixed pixels are always present in the intersection of different materials, a novel sparse constraint based on spatial homogeneity is designed to constrain the abundance and abstract local spatial features. Ablation experiments are conducted to verify the effectiveness of the proposed AE structure, SSA module, and sparse constraint. The proposed method is compared with several state-of-the-art unmixing methods and exhibits competitiveness on both synthetic and real datasets.