A key element for reducing energy consumption and improving thermal comfort on high-speed rail is controlling air-conditioning temperature. Accurate prediction of air supply temperature is aimed at improving control effects. Existing studies of supply air temperature prediction models are interdisciplinary, involving heat transfer science and computer science, where the problem is defined as time-series prediction. However, the model is widely accepted as a complex model that is nonlinear and dynamic. That makes it difficult for existing statistical and deep learning methods, e.g., autoregressive integrated moving average model (ARIMA), convolutional neural network (CNN), and long short-term memory network (LSTM), to fully capture the interaction between these variables and provide accurate prediction results. Recent studies have shown the potential of the Transformer to increase the prediction capacity. This paper offers an improved temporal fusion transformers (TFT) prediction model for supply air temperature in high-speed train carriages to tackle these challenges, with two improvements: (i) Double-convolutional residual encoder structure based on dilated causal convolution; (ii) Spatio-temporal double-gated structure based on Gated Linear Units. Moreover, this study designs a loss function suitable for general long sequence time-series forecast tasks for temperature forecasting. Empirical simulations using a high-speed rail air-conditioning operation dataset at a specific location in China show that the temperature prediction of the two units using the improved TFT model improves the MAPE by 21.70% and 11.73%, respectively the original model. Furthermore, experiments demonstrate that the model effectively outperforms seven popular methods on time series computing tasks, and the attention of the prediction problem in the time dimension is analyzed.