ffs_train is a wrapper function for a simple use of the forward feature selection approach of training random forest classification models. This validation is particulary suitable for leave-location-out cross validations where variable selection MUST be based on the performance of the model on the hold out station. See Meyer et al. (2018) for further details. This is in fact the case while using time space variable vegetation patterns for classification purposes. For the UAV based RGB/NIR imagery, it provides an optimized preconfiguration for the classification goals.

ffs_train(
  trainingDF = NULL,
  predictors = c("R", "G", "B"),
  response = "ID",
  spaceVar = "FN",
  names = c("ID", "R", "G", "B", "A", "FN"),
  noLoc = NULL,
  sumFunction = "twoClassSummary",
  pVal = 0.5,
  prefin = "final_",
  preffs = "ffs_",
  modelSaveName = "model.RData",
  runtest = FALSE,
  seed = 100,
  withinSE = TRUE,
  mtry = 2,
  noClu = 1
)

Arguments

trainingDF

dataframe. containing training data

predictors

character. vector of predictor names as given by the header of the training data table

response

character. name of response variable as given by the header of the training data table

spaceVar

character. name of the spacetime splitting vatiable as given by the header of the training data table

names

character. all names of the dataframe header

noLoc

numeric. number of locations to leave out usually number of discrete trainings locations/images

sumFunction

character. function to summarize default is "twoClassSummary"

pVal

numeric. used part of the training data default is 0.5

prefin

character. name pattern used for model default is "final_"

preffs

character. name pattern used for ffs default is "ffs_"

modelSaveName

character. name pattern used for saving the model default is "model.RData"

runtest

logical. default is false, if set a external validation will be performed

seed

numeric. number for seeding

withinSE

locical. compares the performance to models that use less variables (e.g. if a model using 5 variables is better than a model using 4 variables but still in the standard error of the 4-variable model, then the 4-variable model is rated as the better model).

mtry

numerical. Number of variable is randomly collected to be sampled at each split time

noClu

numeric. number of cluster to be used

Value

model of a forward feature selection driven random forest classification

Note

The workflow of uavRst is intended to use the forward feature selection as decribed by Meyer et al. (2018). This approach needs at least a pair of images that differ in time and/or space for a leave one location out validation mode. You may overcome this situation if you tile your image and provide for each tile seperate training data. If you just want to classify a single image by a single training file use the normal procedure as provided by the trainControl function.

Examples