MyJournals Home  

RSS FeedsRemote Sensing, Vol. 11, Pages 185: Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification (Remote Sensing)


20 january 2019 05:00:02

Remote Sensing, Vol. 11, Pages 185: Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification (Remote Sensing)

High spatial resolution (1–5 m) remotely sensed datasets are increasingly being used to map land covers over large geographic areas using supervised machine learning algorithms. Although many studies have compared machine learning classification methods, sample selection methods for acquiring training and validation data for machine learning, and cross-validation techniques for tuning classifier parameters are rarely investigated, particularly on large, high spatial resolution datasets. This work, therefore, examines four sample selection methods—simple random, proportional stratified random, disproportional stratified random, and deliberative sampling—as well as three cross-validation tuning approaches—k-fold, leave-one-out, and Monte Carlo methods. In addition, the effect on the accuracy of localizing sample selections to a small geographic subset of the entire area, an approach that is sometimes used to reduce costs associated with training data collection, is investigated. These methods are investigated in the context of support vector machines (SVM) classification and geographic object-based image analysis (GEOBIA), using high spatial resolution National Agricultural Imagery Program (NAIP) orthoimagery and LIDAR-derived rasters, covering a 2,609 km2 regional-scale area in northeastern West Virginia, USA. Stratified-statistical-based sampling methods were found to generate the highest classification accuracy. Using a small number of training samples collected from only a subset of the study area provided a similar level of overall accuracy to a sample of equivalent size collected in a dispersed manner across the entire regional-scale dataset. There were minimal differences in accuracy for the different cross-validation tuning methods. The processing time for Monte Carlo and leave-one-out cross-validation were high, especially with large training sets. For this reason, k-fold cross-validation appears to be a good choice. Classifications trained with samples collected deliberately (i.e., not randomly) were less accurate than classifiers trained from statistical-based samples. This may be due to the high positive spatial autocorrelation in the deliberative training set. Thus, if possible, samples for training should be selected randomly; deliberative samples should be avoided. Digg Facebook Google StumbleUpon Twitter
20 viewsCategory: Geology, Physics
Remote Sensing, Vol. 11, Pages 186: An Adaptive End-to-End Classification Approach for Mobile Laser Scanning Point Clouds Based on Knowledge in Urban Scenes (Remote Sensing)
Remote Sensing, Vol. 11, Pages 183: Model Simulation and Prediction of Decadal Mountain Permafrost Distribution Based on Remote Sensing Data in the Qilian Mountains from the 1990s to the 2040s (Remote Sensing)
blog comments powered by Disqus
The latest issues of all your favorite science journals on one page


Register | Retrieve



Use these buttons to bookmark us: Digg Facebook Google StumbleUpon Twitter

Valid HTML 4.01 Transitional
Copyright © 2008 - 2019 Indigonet Services B.V.. Contact: Tim Hulsen. Read here our privacy notice.
Other websites of Indigonet Services B.V.: Nieuws Vacatures News Tweets Travel Photos Nachrichten Indigonet Finances Leer Mandarijn