Smoke detection technology based on computer vision is a popular research direction in fire detection. This technology is widely used in outdoor fire detection fields (e.g., forest fire detection). Smoke detection is often based on features such as color, shape, texture, and motion to distinguish between smoke and non-smoke objects. However, the salience and robustness of these features are insufficiently strong, resulting in low smoke detection performance under complex environment. Deep learning technology has improved smoke detection performance to a certain degree, but extracting smoke detail features is difficult when the number of network layers is small. With no effective use of smoke motion characteristics, indicators such as false alarm rate are high in video smoke detection. To enhance the detection performance of smoke objects in videos, this paper proposes a concept of change-cumulative image by converting the YUV color space of multi-frame video images into a change-cumulative image, which can represent the motion and color-change characteristics of smoke. Then, a fusion deep network is designed, which increases the depth of the VGG16 network by arranging two convolutional layers after each of its convolutional layer. The VGG16 and Resnet50 (Deep residual network) network models are also arranged using the fusion deep network to improve feature expression ability while increasing the depth of the whole network. Doing so can help extract additional discriminating characteristics of smoke. Experimental results show that by using the change-cumulative image as the input image of the deep network model, smoke detection performance is superior to the classic RGB input image; the smoke detection performance of the fusion deep network model is better than that of the single VGG16 and Resnet50 network models; the smoke detection accuracy, false positive rate, and false alarm rate of this method are better than those of the current popular methods of video smoke detection.