See Javadoc for complete documentation of this class.
Usage: *No options*
These commands can be used to perform a variety of tasks, ranging from defining important settings about the object to actually using it.
<output> = clone [-empty] – Create a copy of this dataset
-empty: Do not copy entries from dataset into clone
<output> = split <number|fraction> – Randomly select and remove entries from dataset
number|fraction: Either the fraction or number of entries to be removed
output: New dataset containing randomly selected entries that were in this dataset
<output> = subset <number|fraction> – Generate a random subset from this dataset
number|fraction: Either the fraction or number of entries to select
output: New dataset containing random selection from this dataset
add $<dataset> [-force] – Add entries from another dataset
dataset: Dataset to be merged with this one
-force: Optional: Whether to force merge if attributes / classes / properties are different
If attributes, classes, or properties are different, attributes and class values in new entries (i.e., those from the other dataset) will be deleted and properties will be merged
add <entries...> – Add entries to a dataset
entries...: Strings describing entries to be added
attributes clear – Clear all attribute data
attributes composition <true|false> – Set whether to use composition as attributes
true|false: - Whether to use composition as attributes
By default, this class does not use composition (by itself) as attributes
attributes expanders add <method> [<options...>] – Add an attribute expander to be run after generating attributes
method: How to expand attributes. Name of a BaseAttributeExpander
options...: Any options for the expansion method These expanders are designed to create new attributes based on existing ones.
attributes expanders clear – Clear the current list of attribute expanders
attributes expanders run – Run the currently-defined list of attribute expanders
attributes generate – Generate attributes for each entry
attributes generators add <method> [<options...>] – Add a new attribute generator to list of generators
method: New generation method. Name of a BaseAttributeGenerator
options...: Any options for the generator method These expanders are designed to create new attributes tailored for a specific application.
attributes generators clear – Clear the current list of attribute generators
attributes generators run – Run the currently-defined list of attribute expanders
attributes properties <directory> – Specify directory that contains the elemental property lookup files
directory: Desired directory
attributes properties add <names...> – Add elemental properties to use when generating attributes
attributes properties add set <name> – Add in all elemental properties from a pre-defined set
name: Name of the pre-defined set
attributes properties pair add <difference|ratio> <names...> – Add properties of element pairs generated by computing the difference or ratio of properties of each element in the pair
difference|ratio: Whether to compute the difference or ratio
names...: Name of elemental properties used to generate derivative attributes In order to make the attribute insensitive to the order of elements in the pair, difference is the absolute difference and ratio is the minimum divided by the maximum
attributes properties pair add <names...> – Add properties of element pairs to use when generating attributes
attributes properties pair remove <names...> – Remove properties of element pairs from list of those used when generating attributes
names...: Name of properties to remove
attributes properties remove <names...> – Remove properties from list of those used when generating attributes
names...: Name of properties to remove
attributes rank <number> <method> [<options...>] – Rank attributes based on predictive power
number: Number of top attributes to print
method: Method used to rank attributes. Name of a BaseAttributeEvaluator
options...: Options for the evaluation method.
attributes – Print all attributes
combine $<dataset> – Add all entries from another dataset
dataset: Dataset to merge with this one. It will remain unchanged.
duplicates <resolver> [<resolver options>] – Eliminate duplicates within a dataset
resolver: Name of BaseDuplicateResolver used to handle duplicates
resolver options: Any options for the resolver
filter <include|exclude> <method> [<options...>] – Run dataset through a filter
include|exclude: Whether to include/exclude only entries that pass the filter
method: Filtering method. Name of a BaseDatasetFilter
options...: Options for the filter
generate <method> [>options<] – Generate new entries
method: Name of a BaseEntryGenerator.
options: Any options for the entry generator
import <filename> [<options...>] – Import data by reading a file
filename: Name of file to import data from
options...: Any options used when parsing this dataset (specific to type of Dataset)
match $<dataset> <num> – Find most similar entries in this dataset
dataset: Dataset containing entries to be matched
num: Number of closest entries to print
Prints the most similar entries in this dataset to those in the dataset passed as the argument.
modify <method> [<options>] – Modify the dataset
method: How to modify dataset. Name of a BaseDatasetModifier.
options: Any options for the dataset
rank <number> <maximum|minimum> <measured|predicted> <method> [<options>] – Print the top ranked entries based by some measure
number: Number of top entries to print
maximum|minimum: Whether to print entries with the largest or smallest objection function
measured|predicted: Whether to use the measured or predicted values when calculation
method: Object function used to rank entries. Name of a BaseEntryRanker
options...: Any options for the objective function
subtract $<data> – Subtract another dataset from this one
data: Data to be subtracted from this dataset After this operation, this dataset will only contain entries not in data:
target <name> [-keep] – Set class variable to be a certain property
name: Name of property to use as class variable
-keep: Whether to keep entries without a measurement for this property
These commands are run by calling "print <variable name> <command> [<options>]". Any output from that command will be printed to standard output.
description – Print description of this dataset
details – Print details about this class
dist – Print distribution of entries between known classes
Variables of this type can be saved in the following formats:
arff – Weka's ARFF format.
Requires that a measured value is available for the class variable of each entry.
comp – All properties with composition written by element fraction
Very similar to the "prop" format"
csv – Comma-separated value format.
The value of each attribute and the measured class variable, if defined.
json – Save dataset into JSON format
poscar – Save dataset as a directory full of POSCARs.
Properties of each entry will be saved in a file in that directory named "properties.txt"
prop – Print out the measured and predicted properties
stats – Writes predicted and measured class variables.
This is intended to allow an external program to evaluate model performance.
template – Save an empty clone of the dataset using serialization