public class CompositionSetDistanceFilter extends BaseDatasetFilter implements java.lang.Cloneable
Entries are labeled based on whether they are farther than a certain threshold.
Note: Euclidean distance is much faster than Manhattan. Weka's algorithms for fast neighbor search are not designed to work with Manhattan distance.
Developer note: This class uses different distance metrics than CompositionDistanceFilter< in order to use the NearestNeighbourSearch classes from Weka to accelerate finding the closest compositions.
Usage: $<dataset> <-mahanttan|-euclidean> <threshold>
Constructor and Description |
---|
CompositionSetDistanceFilter() |
Modifier and Type | Method and Description |
---|---|
void |
addComposition(CompositionEntry entry)
Add a new composition to the dataset.
|
void |
addCompositions(java.util.Collection<CompositionEntry> comps)
Add a list of compositions to the set
|
void |
clearCompositions()
Clear the list of compositions in set
|
CompositionSetDistanceFilter |
clone() |
double |
computeDistance(CompositionEntry entry)
Compute the distance between a composition and the set of other compositions stored
in this object.
|
protected static weka.core.Instances |
convertCompositionsToInstances(java.util.Collection<CompositionEntry> entries)
Convert a collection of entries to a Weka Instances object.
|
static weka.core.Instance |
convertCompositionToInstance(CompositionEntry entry)
Convert a composition to a Weka instance.
|
boolean[] |
label(Dataset D)
Given a dataset, determine which entries passes the filter.
|
void |
makeNeighborSearchTool()
Generate a tool to enable fast searching for nearest compositions.
|
protected int |
parallelMinimum()
Minimum number of entries to label in parallel.
|
java.lang.String |
printUsage()
Print out required format for options.
|
void |
setCompositions(CompositionDataset data)
Set the list of compositions to be considered
|
void |
setDistanceThreshold(double dist)
Set the distance threshold
|
void |
setOptions(java.util.List<java.lang.Object> Options)
Set any options for this object.
|
void |
setUseEuclidean()
Use Euclidean distance as the distance metric
|
void |
setUseManhattan()
Use Manhattan distance as the distance metric
|
void |
setUseManhattan(boolean manhattan)
Set whether to use Manhattan (vs Euclidean) distance
|
void |
train(Dataset TrainingSet)
Train a dataset splitter, if necessary
|
filter, parallelLabel, setExclude, toExclude
public void setOptions(java.util.List<java.lang.Object> Options) throws java.lang.Exception
Options
setOptions
in interface Options
Options
- Array of options as Objects - can be null
java.lang.Exception
- if problem with inputspublic CompositionSetDistanceFilter clone()
clone
in class java.lang.Object
public java.lang.String printUsage()
Options
printUsage
in interface Options
public void setUseManhattan(boolean manhattan)
manhattan
- Desired settingpublic void setUseManhattan()
setUseEuclidean()
public void setUseEuclidean()
setUseManhattan()
public void clearCompositions()
public void addComposition(CompositionEntry entry)
entry
- Entry to be addedpublic void addCompositions(java.util.Collection<CompositionEntry> comps)
comps
- Collection of compositions to be addedpublic void setCompositions(CompositionDataset data)
data
- Dataset containing compositions to use as datasetpublic void setDistanceThreshold(double dist) throws java.lang.Exception
dist
- Desired distance thresholdjava.lang.Exception
- If distance is ≤ 0protected int parallelMinimum()
BaseDatasetFilter
parallelMinimum
in class BaseDatasetFilter
public void train(Dataset TrainingSet)
BaseDatasetFilter
train
in class BaseDatasetFilter
TrainingSet
- Dataset to use for trainingpublic boolean[] label(Dataset D)
BaseDatasetFilter
label
in class BaseDatasetFilter
D
- Dataset to be labeledpublic static weka.core.Instance convertCompositionToInstance(CompositionEntry entry)
entry
- Entry to be convertedprotected static weka.core.Instances convertCompositionsToInstances(java.util.Collection<CompositionEntry> entries)
entries
- Entries to be convertedpublic void makeNeighborSearchTool()
public double computeDistance(CompositionEntry entry)
entry
- Composition of entry