public abstract class BaseModel extends java.lang.Object implements java.io.Serializable, java.lang.Cloneable, Options, Printable, Commandable, Savable, Citable
All models support the ability to filter training data, normalize attributes, and select attributes before model training (in that order).
Implementation Guide
Operations that must be implemented:
Implemented Commands:
clone - Create a copy of this model train $<dataset> - Train model using measured class values
output = crossvalidate $<dataset> [<folds>] -
Use k-fold cross-validation to assess model performance.
output = crossvalidate $<dataset> <split size>> [<n repeats>] -
Cross-validation by splitting dataset into train and test sets. Test is repeated
multiple times
run $<dataset> - Use model to predict class values for each entry
validate $<dataset> - Validate model against external dataset
set selector $<selector> - Define the set selector $<filter> - Define the normalize [attributes] [class] <method> [<options...>]
- Define how to normalize data (data is not normalized by default)
Splits dataset into folds parts. Trains model on folds - 1 parts, validates against remaining part.
Repeats using each part as the validation set.
Same command structure as k-fold cross-validation.
Runs if the number of folds is less than 1.BaseAttributeSelector
used to screen attributes before training
BaseDatasetFilter
used to filter data
before attribute normalization, attribute selection, and model training.
Implemented Print Commands
description - Print out short description of this model. model - Print out the model validation [<command>] - Print out statistics generated during validation
training [<command>] - Print out statistics generated during training
selector - Print out attributes used selected by internal BaseAttributeSelector
, if defined
Implemented Save Commands
training - Print out performance data for training set validation - Print out performance data for validation set
Modifier and Type | Field and Description |
---|---|
protected BaseAttributeSelector |
AttributeSelector
BaseAttributeSelector used to screen attributes during training
|
protected boolean |
trained
Records whether model has been trained
|
BaseStatistics |
TrainingStats
Statistics about performance on training set
|
protected boolean |
validated
Records whether model has been validated
|
BaseStatistics |
ValidationStats
Statistics generated during model validation
|
Constructor and Description |
---|
BaseModel() |
Modifier and Type | Method and Description |
---|---|
java.lang.String |
about()
Prints a simple status message about this object
|
BaseModel |
clone() |
Dataset |
crossValidate(double testFraction,
int nRepeats,
Dataset data,
long seed)
Run a cross-validation test where the dataset is randomly partitioned into
a training and test set.
|
Dataset |
crossValidate(int folds,
Dataset cvData)
Perform k-fold cross validation
|
Dataset |
crossValidate(int folds,
Dataset cvData,
long seed)
Perform k-fold cross validation
|
void |
done()
Run if done with a model, clears any external resources.
|
void |
externallyValidate(Dataset testData)
Use external testing data to validate a model (should not contain any data
used to train the model)
|
BaseAttributeSelector |
getAttributeSelector()
Return the BaseAttributeSelector used by this model
|
java.util.List<org.apache.commons.lang3.tuple.Pair<java.lang.String,Citation>> |
getCitations()
Return a list of citations for this object and any underlying objects.
|
BaseDatasetFilter |
getFilter()
Get filter used before training
|
java.util.Date |
getTrainTime()
Return when this model was trained
|
java.lang.String |
getValidationMethod()
Get a description of how this model was validated
|
void |
handleSetCommand(java.util.List<java.lang.Object> Command)
Handle setting components of a model via the command interface
|
boolean |
isTrained() |
boolean |
isValidated() |
static BaseModel |
loadState(java.lang.String filename)
Read the state from file using serialization
|
java.lang.String |
printCommand(java.util.List<java.lang.String> Command)
Handles more complicated printing commands.
|
java.lang.String |
printDescription(boolean htmlFormat)
Print full name of object, and a simple description of the options.
|
protected abstract java.lang.String |
printModel_protected()
Internal method that handles printing the model as a string.
|
java.lang.String |
printModel()
If a model is trained, return it formatted as a string.
|
protected java.util.List<java.lang.String> |
printModelDescriptionDetails(boolean htmlFormat)
Print details of the model.
|
void |
resetModel()
Mark this model as untrained and unvalidated
|
abstract void |
run_protected(Dataset TrainData)
Run a model without checking if stuff is trained (use carefully)
|
void |
run(Dataset runData)
Run a model on provided data.
|
java.lang.Object |
runCommand(java.util.List<java.lang.Object> command)
Process some command described by a list of Objects.
|
java.lang.String |
saveCommand(java.lang.String Basename,
java.lang.String Format)
Handles complicated saving commands.
|
void |
saveState(java.lang.String filename)
Save the state of this object using serialization
|
void |
setAttributeSelector(BaseAttributeSelector AttributeSelector)
Define an attribute selector that will force this model to only use a
subset of the attributes supplied with a Dataset
|
void |
setComponent(java.lang.String Name,
java.lang.Object Object)
Set a specific component of a model
|
void |
setFilter(BaseDatasetFilter filter)
Set filter used to clean data before training
|
protected abstract void |
train_protected(Dataset TrainData)
Train a model without evaluating performance
|
void |
train(Dataset TrainingData)
Train a model on a specified training set and then evaluate performance
on the training set.
|
void |
train(Dataset data,
boolean recordStats)
Train a model on a specified training set and then evaluate performance
on the training set, if desired
|
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
printUsage, setOptions
public BaseStatistics TrainingStats
public BaseStatistics ValidationStats
protected boolean trained
protected boolean validated
protected BaseAttributeSelector AttributeSelector
public static BaseModel loadState(java.lang.String filename) throws java.lang.Exception
filename
- Filename for inputjava.lang.Exception
- If parsing failspublic boolean isTrained()
public java.util.Date getTrainTime()
public boolean isValidated()
public void resetModel()
public BaseAttributeSelector getAttributeSelector()
public void setAttributeSelector(BaseAttributeSelector AttributeSelector)
AttributeSelector
- Untrained BaseAttributeSelectorpublic BaseDatasetFilter getFilter()
public void setFilter(BaseDatasetFilter filter)
filter
- Desired filterpublic Dataset crossValidate(int folds, Dataset cvData)
folds
- Number of folds in CV testcvData
- Data to use for CVpublic Dataset crossValidate(int folds, Dataset cvData, long seed)
folds
- Number of folds in CV testcvData
- Data to use for CVseed
- Random seed used when splitting datasetpublic Dataset crossValidate(double testFraction, int nRepeats, Dataset data, long seed)
For data with a discrete class, we ensure that the distribution of classes in the train and test set are the same.
testFraction
- Fraction entries in the test setsnRepeats
- Number of times test is repeateddata
- Dataset used for cross-validationseed
- Random seedpublic void externallyValidate(Dataset testData)
testData
- External test datasetpublic java.lang.String getValidationMethod()
public void train(Dataset TrainingData)
TrainingData
- Dataset used for trainingpublic void train(Dataset data, boolean recordStats)
data
- Dataset to use for trainingrecordStats
- Whether to record training statisticspublic void run(Dataset runData)
runData
- Dataset to evaluate.public void saveState(java.lang.String filename) throws java.lang.Exception
filename
- Filename for outputjava.lang.Exception
public BaseModel clone()
clone
in class java.lang.Object
protected abstract void train_protected(Dataset TrainData)
TrainData
- Training datapublic abstract void run_protected(Dataset TrainData)
TrainData
- Training datapublic java.lang.String about()
Printable
public java.lang.String printModel()
protected abstract java.lang.String printModel_protected()
public java.lang.String printDescription(boolean htmlFormat)
Printable
Example: For a model training a separate WekaRegression for intermetallics
magpie.models.regression.SplitRegression
printDescription
in interface Printable
htmlFormat
- Whether format for output to an HTML page
(e.g., <div> to create indentation) or for printing to screen.#printModel()
protected java.util.List<java.lang.String> printModelDescriptionDetails(boolean htmlFormat)
printDescription(boolean)
.
Implementation note: No not add indentation for details. That is handled
by printDescription(boolean)
. You should also call the super
operation to get the Normalizer and Attribute selector settings
htmlFormat
- Whether to use HTML formatpublic java.lang.String printCommand(java.util.List<java.lang.String> Command) throws java.lang.Exception
Printable
printCommand
in interface Printable
Command
- Command specifying what to printjava.lang.Exception
- If command not understoodpublic java.lang.Object runCommand(java.util.List<java.lang.Object> command) throws java.lang.Exception
Commandable
runCommand
in interface Commandable
command
- Command as a list of objectsjava.lang.Exception
- If something goes wrongpublic void handleSetCommand(java.util.List<java.lang.Object> Command) throws java.lang.Exception
Command
- Command to be executed.java.lang.Exception
public void setComponent(java.lang.String Name, java.lang.Object Object) throws java.lang.Exception
Name
- Name of componentObject
- Instance of component (will be cloned)java.lang.Exception
public java.lang.String saveCommand(java.lang.String Basename, java.lang.String Format) throws java.lang.Exception
Savable
Dev Note: Make sure to add save format to Javadoc. See Dataset as an example. Required format:
<save><p><b>format<b> - Description
<br>Optional room to talk more about format </save>
saveCommand
in interface Savable
Basename
- Name of file without extensionFormat
- Command specifying format in which to printjava.lang.Exception
- If command not understoodpublic java.util.List<org.apache.commons.lang3.tuple.Pair<java.lang.String,Citation>> getCitations()
Citable
getCitations
in interface Citable
public void done()