SplitClassifier

See Javadoc for complete documentation of this class.

Usage: *No options*

Available Operations

These commands can be used to perform a variety of tasks, ranging from defining important settings about the object to actually using it.

clone – Create a copy of this model

normalize [attributes] [class] <method> [<options...>] – Define how to normalize data (data is not normalized by default)
attributes: Whether to normalize attributes
class: Whether to normalize class variable
method: Method used to normalize attributes
options...: Any options for the normalizer

output = crossvalidate $<dataset> <split size>> [<n repeats>] – Cross-validation by splitting dataset into train and test sets. Test is repeated multiple times
dataset: Dataset to use for cross validation
folds: Fraction of entries used in test set
n repeats: Number of times to repeat test
output: Dataset, result of used to compute performance statistics
Same command structure as k-fold cross-validation. Runs if the number of folds is less than 1.

output = crossvalidate $<dataset> [<folds>] – Use k-fold cross-validation to assess model performance.
dataset: Dataset to use for cross validation
folds: Number of cross validation folds (default = 10)
output: Dataset, result of used to compute performance statistics
Splits dataset into folds parts. Trains model on folds - 1 parts, validates against remaining part. Repeats using each part as the validation set.

run $<dataset> – Use model to predict class values for each entry
dataset: Dataset to evaluate

set selector $<filter> – Define the BaseDatasetFilter used to filter data before attribute normalization, attribute selection, and model training.
filter: Filter to use

set selector $<selector> – Define the BaseAttributeSelector used to screen attributes before training
selector: Attribute selector to use

splitter <method> [<options...>] – Define splitter used to partition dataset between models
method: Method used to split data. Name of a BaseDatasetSplitter ("?" for options)
options: Any options for the splitter

submodel get <number> = <output> – Retrieve a specific submodel
number: Index of submodel to retrieve (list starts with 0) Returns a clone of the model - you cannot use this to edit the model.

submodel get generic = <output> – Retrieve the template for any unassigned submodels

submodel set <number> $<model> – Set a specific submodel
number: Index of the submodel to set (list starts with 0)
model: An instance of BaseModel to use for that model

submodel set generic $<model> – Define a model template to use for all submodels
model: An instance of BaseModel. Note: Do not use this command for CompositeRegression unless each model automatically uses a different random number seed. Otherwise, each submodel will be identical.

submodel – Print the number of submodels

train $<dataset> – Train model using measured class values
dataset: Dataset used to train this model

validate $<dataset> – Validate model against external dataset
dataset: - Dataset to use for validate

Available Print Commands

These commands are run by calling "print <variable name> <command> [<options>]". Any output from that command will be printed to standard output.

description – Print out short description of this model.

model – Print out the model

selector – Print out attributes used selected by internal BaseAttributeSelector, if defined

splitter – Print out the name of splitter used by this model

submodel <number> [<command...>] – Pass a print command to one of the submodels
number: Index of model to operate on (starts at 0)
command: Print command that gets passed to that submodel

submodel – Print out number of submodels

training [<command>] – Print out statistics generated during training
command: Command to be passed to internal BaseStatistics object.

validation [<command>] – Print out statistics generated during validation
command: Command to be passed to internal BaseStatistics object.

Available Save Formats

Variables of this type can be saved in the following formats:

training – Print out performance data for training set

validation – Print out performance data for validation set