Coarse spatial resolution sensors play a major role in capturing temporal variation, as satellite images that capture fine spatial scales have a relatively long revisit cycle. The trade-off between the revisit cycle and spatial resolution hinders the access of terrestrial latent heat flux (LE) data with both fine spatial and temporal resolution. In this paper, we firstly investigated the capability of an Extremely Randomized Trees Fusion Model (ERTFM) to reconstruct high spatiotemporal resolution reflectance data from a fusion of the Chinese GaoFen-1 (GF-1) and the Moderate Resolution Imaging Spectroradiometer (MODIS) products. Then, based on the merged reflectance data, we used a Modified-Satellite Priestley–Taylor (MS–PT) algorithm to generate LE products at high spatial and temporal resolutions. Our results illustrated that the ERTFM-based reflectance estimates showed close similarity with observed GF-1 images and the predicted NDVI agreed well with observed NDVI at two corresponding dates (r = 0.76 and 0.86, respectively). In comparison with other four fusion methods, including the widely used spatial and temporal adaptive reflectance fusion model (STARFM) and the enhanced STARFM, ERTFM had the best performance in terms of predicting reflectance (SSIM = 0.91; r = 0.77). Further analysis revealed that LE estimates using ERTFM-based data presented more detailed spatiotemporal characteristics and provided close agreement with site-level LE observations, with an R2 of 0.81 and an RMSE of 19.18 W/m2. Our findings suggest that the ERTFM can be used to improve LE estimation with high frequency and high spatial resolution, meaning that it has great potential to support agricultural monitoring and irrigation management.