weka.filters
Class SplitDatasetFilter

java.lang.Object
  |
  +--weka.filters.Filter
        |
        +--weka.filters.SplitDatasetFilter
All Implemented Interfaces:
OptionHandler, java.io.Serializable

public class SplitDatasetFilter
extends Filter
implements OptionHandler

This filter takes a dataset and outputs a subset of it. If a class attribute is assigned, the dataset will be stratified when fold-based splitting. Valid options are:

-R inst1,inst2-inst4,...
Specifies list of instances to select. First and last are valid indexes. (default fold-based splitting)

-V
Specifies if inverse of selection is to be output.

-N number of folds
Specifies number of folds dataset is split into (default 10).

-F fold
Specifies which fold is selected. (default 1)

-S seed
Specifies a random number seed for shuffling the dataset. (default 0, don't randomize)

-A
If set, data is not being stratified even if class index is set.

Author:
Eibe Frank (eibe@cs.waikato.ac.nz)
See Also:
Serialized Form

Fields inherited from class weka.filters.Filter
m_NewBatch
 
Constructor Summary
SplitDatasetFilter()
           
 
Method Summary
 boolean batchFinished()
          Signify that this batch of input to the filter is finished.
 boolean getDontStratifyData()
          Gets whether stratification is not performed.
 int getFold()
          Gets the fold which is selected.
 java.lang.String getInstancesIndices()
          Gets ranges of instances selected.
 boolean getInvertSelection()
          Gets if selection is to be inverted.
 int getNumFolds()
          Gets the number of folds in which dataset is to be split into.
 java.lang.String[] getOptions()
          Gets the current settings of the filter.
 long getSeed()
          Gets the random number seed used for shuffling the dataset.
 java.util.Enumeration listOptions()
          Gets an enumeration describing the available options.
static void main(java.lang.String[] argv)
          Main method for testing this class.
 void setDontStratifyData(boolean flag)
          Sets whether stratification is not performed.
 void setFold(int fold)
          Selects a fold.
 boolean setInputFormat(Instances instanceInfo)
          Sets the format of the input instances.
 void setInstancesIndices(java.lang.String rangeList)
          Sets the ranges of instances to be selected.
 void setInvertSelection(boolean inverse)
          Sets if selection is to be inverted.
 void setNumFolds(int numFolds)
          Sets the number of folds the dataset is split into.
 void setOptions(java.lang.String[] options)
          Parses the options for this object.
 void setSeed(long seed)
          Sets the random number seed for shuffling the dataset.
 
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyStringValues, copyStringValues, filterFile, flushInput, getInputFormat, getInputStringIndex, getOutputFormat, getOutputStringIndex, getStringIndices, input, inputFormat, isOutputFormatDefined, numPendingOutput, output, outputFormat, outputFormatPeek, outputPeek, push, resetQueue, setOutputFormat, useFilter
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SplitDatasetFilter

public SplitDatasetFilter()
Method Detail

listOptions

public java.util.Enumeration listOptions()
Gets an enumeration describing the available options.
Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses the options for this object. Valid options are:

-R inst1,inst2-inst4,...
Specifies list of instances to select. First and last are valid indexes. (default fold-based splitting)

-V
Specifies if inverse of selection is to be output.

-N number of folds
Specifies number of folds dataset is split into (default 10).

-F fold
Specifies which fold is selected. (default 1)

-S seed
Specifies a random number seed for shuffling the dataset. (default 0, no randomizing)

-A
If set, data is not being stratified even if class index is set.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the filter.
Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

getInstancesIndices

public java.lang.String getInstancesIndices()
Gets ranges of instances selected.
Returns:
a string containing a comma-separated list of ranges

setInstancesIndices

public void setInstancesIndices(java.lang.String rangeList)
Sets the ranges of instances to be selected. If provided string is null, ranges won't be used for selecting instances.
Parameters:
rangeList - a string representing the list of instances. eg: first-3,5,6-last
Throws:
java.lang.IllegalArgumentException - if an invalid range list is supplied

getInvertSelection

public boolean getInvertSelection()
Gets if selection is to be inverted.
Returns:
true if the selection is to be inverted

setInvertSelection

public void setInvertSelection(boolean inverse)
Sets if selection is to be inverted.
Parameters:
inverse - true if inversion is to be performed

getNumFolds

public int getNumFolds()
Gets the number of folds in which dataset is to be split into.
Returns:
the number of folds the dataset is to be split into.

setNumFolds

public void setNumFolds(int numFolds)
Sets the number of folds the dataset is split into. If the number of folds is zero, it won't split it into folds.
Parameters:
numFolds - number of folds dataset is to be split into
Throws:
java.lang.IllegalArgumentException - if number of folds is negative

getFold

public int getFold()
Gets the fold which is selected.
Returns:
the fold which is selected

setFold

public void setFold(int fold)
Selects a fold.
Parameters:
fold - the fold to be selected.
Throws:
java.lang.IllegalArgumentException - if fold's index is smaller than 1

getSeed

public long getSeed()
Gets the random number seed used for shuffling the dataset.
Returns:
the random number seed

setSeed

public void setSeed(long seed)
Sets the random number seed for shuffling the dataset. If seed is negative, shuffling won't be performed.
Parameters:
seed - the random number seed

setDontStratifyData

public void setDontStratifyData(boolean flag)
Sets whether stratification is not performed.

getDontStratifyData

public boolean getDontStratifyData()
Gets whether stratification is not performed.

setInputFormat

public boolean setInputFormat(Instances instanceInfo)
                       throws java.lang.Exception
Sets the format of the input instances.
Overrides:
setInputFormat in class Filter
Parameters:
instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
Returns:
true because outputFormat can be collected immediately
Throws:
java.lang.Exception - if the input format can't be set successfully

batchFinished

public boolean batchFinished()
Signify that this batch of input to the filter is finished. Output() may now be called to retrieve the filtered instances.
Overrides:
batchFinished in class Filter
Returns:
true if there are instances pending output
Throws:
java.lang.IllegalStateException - if no input structure has been defined

main

public static void main(java.lang.String[] argv)
Main method for testing this class.
Parameters:
argv - should contain arguments to the filter: use -h for help