|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--weka.filters.Filter | +--weka.filters.DiscretizeFilter
An instance filter that discretizes a range of numeric attributes in the dataset into nominal attributes. Discretization can be either by simple binning, or by Fayyad & Irani's MDL method (the default).
Valid filter-specific options are:
-B num
Specify the (maximum) number of bins to divide numeric attributes into.
(default class-based discretisation).
-O
Optimizes the number of bins using a leave-one-out estimate of the
entropy.
-R col1,col2-col4,...
Specify list of columns to Discretize. First
and last are valid indexes. (default none)
-V
Invert matching sense.
-D
Make binary nominal attributes.
-E
Use better encoding of split point for MDL.
-K
Use Kononeko's MDL criterion.
Field Summary | |
protected double[][] |
m_CutPoints
Store the current cutpoints |
protected Range |
m_DiscretizeCols
Stores which columns to Discretize |
protected boolean |
m_FindNumBins
Find the number of bins using cross-validated entropy. |
protected boolean |
m_MakeBinary
Output binary attributes for discretized attributes. |
protected int |
m_NumBins
The number of bins to divide the attribute into |
protected boolean |
m_UseBetterEncoding
Use better encoding of split point for MDL. |
protected boolean |
m_UseKononenko
Use Kononenko's MDL criterion instead of Fayyad et al.'s |
protected boolean |
m_UseMDL
True if discretisation will be done by MDL rather than binning |
Fields inherited from class weka.filters.Filter |
m_NewBatch |
Constructor Summary | |
DiscretizeFilter()
Constructor - initialises the filter |
Method Summary | |
java.lang.String |
attributeIndicesTipText()
Returns the tip text for this property |
boolean |
batchFinished()
Signifies that this batch of input to the filter is finished. |
java.lang.String |
binsTipText()
Returns the tip text for this property |
protected void |
calculateCutPoints()
Generate the cutpoints for each attribute |
protected void |
calculateCutPointsByBinning(int index)
Set cutpoints for a single attribute. |
protected void |
calculateCutPointsByMDL(int index,
Instances data)
Set cutpoints for a single attribute using MDL. |
protected void |
convertInstance(Instance instance)
Convert a single instance over. |
protected void |
findNumBins(int index)
Optimizes the number of bins using leave-one-out cross-validation. |
java.lang.String |
findNumBinsTipText()
Returns the tip text for this property |
java.lang.String |
getAttributeIndices()
Gets the current range selection |
int |
getBins()
Gets the number of bins numeric attributes will be divided into |
double[] |
getCutPoints(int attributeIndex)
Gets the cut points for an attribute |
boolean |
getFindNumBins()
Get the value of FindNumBins. |
boolean |
getInvertSelection()
Gets whether the supplied columns are to be removed or kept |
boolean |
getMakeBinary()
Gets whether binary attributes should be made for discretized ones. |
java.lang.String[] |
getOptions()
Gets the current settings of the filter. |
boolean |
getUseBetterEncoding()
Gets whether better encoding is to be used for MDL. |
boolean |
getUseKononenko()
Gets whether Kononenko's MDL criterion is to be used. |
boolean |
getUseMDL()
Gets whether MDL will be used as the discretisation method. |
java.lang.String |
globalInfo()
Returns a string describing this filter |
boolean |
input(Instance instance)
Input an instance for filtering. |
java.lang.String |
invertSelectionTipText()
Returns the tip text for this property |
java.util.Enumeration |
listOptions()
Gets an enumeration describing the available options |
static void |
main(java.lang.String[] argv)
Main method for testing this class. |
java.lang.String |
makeBinaryTipText()
Returns the tip text for this property |
void |
setAttributeIndices(java.lang.String rangeList)
Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized). |
void |
setAttributeIndicesArray(int[] attributes)
Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized). |
void |
setBins(int numBins)
Sets the number of bins to divide each selected numeric attribute into |
void |
setFindNumBins(boolean newFindNumBins)
Set the value of FindNumBins. |
boolean |
setInputFormat(Instances instanceInfo)
Sets the format of the input instances. |
void |
setInvertSelection(boolean invert)
Sets whether selected columns should be removed or kept. |
void |
setMakeBinary(boolean makeBinary)
Sets whether binary attributes should be made for discretized ones. |
void |
setOptions(java.lang.String[] options)
Parses the options for this object. |
protected void |
setOutputFormat()
Set the output format. |
void |
setUseBetterEncoding(boolean useBetterEncoding)
Sets whether better encoding is to be used for MDL. |
void |
setUseKononenko(boolean useKon)
Sets whether Kononenko's MDL criterion is to be used. |
void |
setUseMDL(boolean useMDL)
Sets whether MDL will be used as the discretisation method. |
java.lang.String |
useBetterEncodingTipText()
Returns the tip text for this property |
java.lang.String |
useKononenkoTipText()
Returns the tip text for this property |
java.lang.String |
useMDLTipText()
Returns the tip text for this property |
Methods inherited from class weka.filters.Filter |
batchFilterFile, bufferInput, copyStringValues, copyStringValues, filterFile, flushInput, getInputFormat, getInputStringIndex, getOutputFormat, getOutputStringIndex, getStringIndices, inputFormat, isOutputFormatDefined, numPendingOutput, output, outputFormat, outputFormatPeek, outputPeek, push, resetQueue, setOutputFormat, useFilter |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected Range m_DiscretizeCols
protected int m_NumBins
protected double[][] m_CutPoints
protected boolean m_UseMDL
protected boolean m_MakeBinary
protected boolean m_UseBetterEncoding
protected boolean m_UseKononenko
protected boolean m_FindNumBins
Constructor Detail |
public DiscretizeFilter()
Method Detail |
public java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-B num
Specify the (maximum) number of equal-width bins to divide
numeric attributes into. (default class-based discretization).
-O
Optimizes the number of bins using a leave-one-out estimate of the
entropy.
-R col1,col2-col4,...
Specify list of columns to discretize. First
and last are valid indexes. (default none)
-V
Invert matching sense.
-D
Make binary nominal attributes.
-E
Use better encoding of split point for MDL.
-K
Use Kononeko's MDL criterion.
setOptions
in interface OptionHandler
options
- the list of options as an array of stringsjava.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface OptionHandler
public boolean setInputFormat(Instances instanceInfo) throws java.lang.Exception
setInputFormat
in class Filter
instanceInfo
- an Instances object containing the input instance
structure (any instances contained in the object are ignored - only the
structure is required).java.lang.Exception
- if the input format can't be set successfullypublic boolean input(Instance instance)
input
in class Filter
instance
- the input instancejava.lang.IllegalStateException
- if no input format has been defined.public boolean batchFinished()
batchFinished
in class Filter
java.lang.IllegalStateException
- if no input structure has been definedpublic java.lang.String globalInfo()
public java.lang.String findNumBinsTipText()
public boolean getFindNumBins()
public void setFindNumBins(boolean newFindNumBins)
newFindNumBins
- Value to assign to FindNumBins.public java.lang.String makeBinaryTipText()
public boolean getMakeBinary()
public void setMakeBinary(boolean makeBinary)
makeBinary
- if binary attributes are to be madepublic java.lang.String useMDLTipText()
public boolean getUseMDL()
public void setUseMDL(boolean useMDL)
useMDL
- true if MDL should be used, false if fixed bins should
be used.public java.lang.String useKononenkoTipText()
public boolean getUseKononenko()
public void setUseKononenko(boolean useKon)
useKon
- true if Kononenko's one is to be usedpublic java.lang.String useBetterEncodingTipText()
public boolean getUseBetterEncoding()
public void setUseBetterEncoding(boolean useBetterEncoding)
useBetterEncoding
- true if better encoding to be used.public java.lang.String binsTipText()
public int getBins()
public void setBins(int numBins)
numBins
- the number of binspublic java.lang.String invertSelectionTipText()
public boolean getInvertSelection()
public void setInvertSelection(boolean invert)
invert
- the new invert settingpublic java.lang.String attributeIndicesTipText()
public java.lang.String getAttributeIndices()
public void setAttributeIndices(java.lang.String rangeList)
rangeList
- a string representing the list of attributes. Since
the string will typically come from a user, attributes are indexed from
1. java.lang.IllegalArgumentException
- if an invalid range list is suppliedpublic void setAttributeIndicesArray(int[] attributes)
attributes
- an array containing indexes of attributes to Discretize.
Since the array will typically come from a program, attributes are indexed
from 0.java.lang.IllegalArgumentException
- if an invalid set of ranges
is suppliedpublic double[] getCutPoints(int attributeIndex)
the
- index (from 0) of the attribute to get the cut points ofprotected void calculateCutPoints()
protected void calculateCutPointsByMDL(int index, Instances data)
index
- the index of the attribute to set cutpoints forprotected void calculateCutPointsByBinning(int index)
index
- the index of the attribute to set cutpoints forprotected void findNumBins(int index)
index
- the attribute indexprotected void setOutputFormat()
protected void convertInstance(Instance instance)
instance
- the instance to convertpublic static void main(java.lang.String[] argv)
argv
- should contain arguments to the filter: use -h for help
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |