ElementDataset

See Javadoc for complete documentation of this class.

Usage: *No options*

Available Operations

These commands can be used to perform a variety of tasks, ranging from defining important settings about the object to actually using it.

<output> = clone [-empty] – Create a copy of this dataset
-empty: Do not copy entries from dataset into clone

<output> = split <number|fraction> – Randomly select and remove entries from dataset
number|fraction: Either the fraction or number of entries to be removed
output: New dataset containing randomly selected entries that were in this dataset

<output> = subset <number|fraction> – Generate a random subset from this dataset
number|fraction: Either the fraction or number of entries to select
output: New dataset containing random selection from this dataset

add $<dataset> [-force] – Add entries from another dataset
dataset: Dataset to be merged with this one
-force: Optional: Whether to force merge if attributes / classes / properties are different
If attributes, classes, or properties are different, attributes and class values in new entries (i.e., those from the other dataset) will be deleted and properties will be merged

add <entries...> – Add entries to a dataset
entries...: Strings describing entries to be added

attributes clear – Clear all attribute data

attributes expanders add <method> [<options...>] – Add an attribute expander to be run after generating attributes
method: How to expand attributes. Name of a BaseAttributeExpander
options...: Any options for the expansion method These expanders are designed to create new attributes based on existing ones.

attributes expanders clear – Clear the current list of attribute expanders

attributes expanders run – Run the currently-defined list of attribute expanders

attributes generate – Generate attributes for each entry

attributes generators add <method> [<options...>] – Add a new attribute generator to list of generators
method: New generation method. Name of a BaseAttributeGenerator
options...: Any options for the generator method These expanders are designed to create new attributes tailored for a specific application.

attributes generators clear – Clear the current list of attribute generators

attributes generators run – Run the currently-defined list of attribute expanders

attributes properties <directory> – Specify directory that contains the elemental property lookup files
directory: Desired directory

attributes properties add <names...> – Add elemental properties to use when generating attributes

attributes properties add set <name> – Add in all elemental properties from a pre-defined set
name: Name of the pre-defined set

attributes properties remove <names...> – Remove properties from list of those used when generating attributes
names...: Name of properties to remove

attributes properties – List which elemental properties are used to generate attributes

attributes rank <number> <method> [<options...>] – Rank attributes based on predictive power
number: Number of top attributes to print
method: Method used to rank attributes. Name of a BaseAttributeEvaluator
options...: Options for the evaluation method.

attributes – Print all attributes

combine $<dataset> – Add all entries from another dataset
dataset: Dataset to merge with this one. It will remain unchanged.

duplicates <resolver> [<resolver options>] – Eliminate duplicates within a dataset
resolver: Name of BaseDuplicateResolver used to handle duplicates
resolver options: Any options for the resolver

filter <include|exclude> <method> [<options...>] – Run dataset through a filter
include|exclude: Whether to include/exclude only entries that pass the filter
method: Filtering method. Name of a BaseDatasetFilter
options...: Options for the filter

generate <method> [>options<] – Generate new entries
method: Name of a BaseEntryGenerator.
options: Any options for the entry generator

import <filename> [<options...>] – Import data by reading a file
filename: Name of file to import data from
options...: Any options used when parsing this dataset (specific to type of Dataset)

match $<dataset> <num> – Find most similar entries in this dataset
dataset: Dataset containing entries to be matched
num: Number of closest entries to print
Prints the most similar entries in this dataset to those in the dataset passed as the argument.

modify <method> [<options>] – Modify the dataset
method: How to modify dataset. Name of a BaseDatasetModifier.
options: Any options for the dataset

rank <number> <maximum|minimum> <measured|predicted> <method> [<options>] – Print the top ranked entries based by some measure
number: Number of top entries to print
maximum|minimum: Whether to print entries with the largest or smallest objection function
measured|predicted: Whether to use the measured or predicted values when calculation
method: Object function used to rank entries. Name of a BaseEntryRanker
options...: Any options for the objective function

subtract $<data> – Subtract another dataset from this one
data: Data to be subtracted from this dataset After this operation, this dataset will only contain entries not in data:

target <name> [-keep] – Set class variable to be a certain property
name: Name of property to use as class variable
-keep: Whether to keep entries without a measurement for this property

Available Print Commands

These commands are run by calling "print <variable name> <command> [<options>]". Any output from that command will be printed to standard output.

description – Print description of this dataset

details – Print details about this class

dist – Print distribution of entries between known classes

Available Save Formats

Variables of this type can be saved in the following formats:

arff – Weka's ARFF format.
Requires that a measured value is available for the class variable of each entry.

csv – Comma-separated value format.
The value of each attribute and the measured class variable, if defined.

json – Save dataset into JSON format

prop – Print out the measured and predicted properties

stats – Writes predicted and measured class variables.
This is intended to allow an external program to evaluate model performance.

template – Save an empty clone of the dataset using serialization