Forward feature selection based on rf model

ffs_train is a wrapper function for a simple use of the forward feature selection approach of training random forest classification models. This validation is particulary suitable for leave-location-out cross validations where variable selection MUST be based on the performance of the model on the hold out station. See Meyer et al. (2018) for further details. This is in fact the case while using time space variable vegetation patterns for classification purposes. For the UAV based RGB/NIR imagery, it provides an optimized preconfiguration for the classification goals.

ffs_train(
  trainingDF = NULL,
  predictors = c("R", "G", "B"),
  response = "ID",
  spaceVar = "FN",
  names = c("ID", "R", "G", "B", "A", "FN"),
  noLoc = NULL,
  sumFunction = "twoClassSummary",
  pVal = 0.5,
  prefin = "final_",
  preffs = "ffs_",
  modelSaveName = "model.RData",
  runtest = FALSE,
  seed = 100,
  withinSE = TRUE,
  mtry = 2,
  noClu = 1
)

Arguments

trainingDF: dataframe. containing training data
predictors: character. vector of predictor names as given by the header of the training data table
response: character. name of response variable as given by the header of the training data table
spaceVar: character. name of the spacetime splitting vatiable as given by the header of the training data table
names: character. all names of the dataframe header
noLoc: numeric. number of locations to leave out usually number of discrete trainings locations/images
sumFunction: character. function to summarize default is "twoClassSummary"
pVal: numeric. used part of the training data default is 0.5
prefin: character. name pattern used for model default is "final_"
preffs: character. name pattern used for ffs default is "ffs_"
modelSaveName: character. name pattern used for saving the model default is "model.RData"
runtest: logical. default is false, if set a external validation will be performed
seed: numeric. number for seeding
withinSE: locical. compares the performance to models that use less variables (e.g. if a model using 5 variables is better than a model using 4 variables but still in the standard error of the 4-variable model, then the 4-variable model is rated as the better model).
mtry: numerical. Number of variable is randomly collected to be sampled at each split time
noClu: numeric. number of cluster to be used

Value

model of a forward feature selection driven random forest classification

Note

The workflow of uavRst is intended to use the forward feature selection as decribed by Meyer et al. (2018). This approach needs at least a pair of images that differ in time and/or space for a leave one location out validation mode. You may overcome this situation if you tile your image and provide for each tile seperate training data. If you just want to classify a single image by a single training file use the normal procedure as provided by the trainControl function.

Forward feature selection based on rf model

Arguments

Value

Note

Examples