|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--weka.classifiers.Classifier | +--weka.classifiers.MetaCost
This metaclassifier makes its base classifier cost-sensitive using the method specified in
Pedro Domingos (1999). MetaCost: A general method for making classifiers cost-sensitive, Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155-164. Also available online at http://www.cs.washington.edu/homes/pedrod/kdd99.ps.gz.
This classifier should produce similar results to one created by passing the base learner to Bagging, which is in turn passed to a CostSensitiveClassifier operating on minimum expected cost. The difference is that MetaCost produces a single cost-sensitive classifier of the base learner, giving the benefits of fast classification and interpretable output (if the base learner itself is interpretable). This implementation uses all bagging iterations when reclassifying training data (the MetaCost paper reports a marginal improvement when only those iterations containing each training instance are used in reclassifying that instance).
Valid options are:
-W classname
Specify the full class name of a classifier (required).
-C cost file
File name of a cost matrix to use. If this is not supplied, a cost
matrix will be loaded on demand. The name of the on-demand file
is the relation name of the training data plus ".cost", and the
path to the on-demand file is specified with the -D option.
-D directory
Name of a directory to search for cost files when loading costs on demand
(default current directory).
-I num
Set the number of bagging iterations (default 10).
-S seed
Random number seed used when reweighting by resampling (default 1).
-P num
Size of each bag, as a percentage of the training size (default 100).
Options after -- are passed to the designated classifier.
Field Summary | |
protected int |
m_BagSizePercent
The size of each bag sample, as a percentage of the training size |
protected Classifier |
m_Classifier
The classifier |
protected java.lang.String |
m_CostFile
The name of the cost file, for command line options |
protected CostMatrix |
m_CostMatrix
The cost matrix |
protected int |
m_MatrixSource
Indicates the current cost matrix source |
protected int |
m_NumIterations
The number of iterations. |
protected java.io.File |
m_OnDemandDirectory
The directory used when loading cost files on demand, null indicates current directory |
protected int |
m_Seed
Seed for reweighting using resampling. |
static int |
MATRIX_ON_DEMAND
|
static int |
MATRIX_SUPPLIED
|
static Tag[] |
TAGS_MATRIX_SOURCE
|
Constructor Summary | |
MetaCost()
|
Method Summary | |
void |
buildClassifier(Instances data)
Builds the model of the base learner. |
double |
classifyInstance(Instance instance)
Classifies a given test instance. |
int |
getBagSizePercent()
Gets the size of each bag, as a percentage of the training set size. |
Classifier |
getClassifier()
Gets the distribution classifier used. |
protected java.lang.String |
getClassifierSpec()
Gets the classifier specification string, which contains the class name of the classifier and any options to the classifier |
CostMatrix |
getCostMatrix()
Gets the misclassification cost matrix. |
SelectedTag |
getCostMatrixSource()
Gets the source location method of the cost matrix. |
int |
getNumIterations()
Gets the number of bagging iterations |
java.io.File |
getOnDemandDirectory()
Returns the directory that will be searched for cost files when loading on demand. |
java.lang.String[] |
getOptions()
Gets the current settings of the Classifier. |
int |
getSeed()
Get seed for resampling. |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options |
static void |
main(java.lang.String[] argv)
Main method for testing this class. |
void |
setBagSizePercent(int newBagSizePercent)
Sets the size of each bag, as a percentage of the training set size. |
void |
setClassifier(Classifier classifier)
Sets the distribution classifier |
void |
setCostMatrix(CostMatrix newCostMatrix)
Sets the misclassification cost matrix. |
void |
setCostMatrixSource(SelectedTag newMethod)
Sets the source location of the cost matrix. |
void |
setNumIterations(int numIterations)
Sets the number of bagging iterations |
void |
setOnDemandDirectory(java.io.File newDir)
Sets the directory that will be searched for cost files when loading on demand. |
void |
setOptions(java.lang.String[] options)
Parses a given list of options. |
void |
setSeed(int seed)
Set seed for resampling. |
java.lang.String |
toString()
Output a representation of this classifier |
Methods inherited from class weka.classifiers.Classifier |
forName, makeCopies |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
public static final int MATRIX_ON_DEMAND
public static final int MATRIX_SUPPLIED
public static final Tag[] TAGS_MATRIX_SOURCE
protected int m_MatrixSource
protected java.io.File m_OnDemandDirectory
protected java.lang.String m_CostFile
protected Classifier m_Classifier
protected CostMatrix m_CostMatrix
protected int m_NumIterations
protected int m_Seed
protected int m_BagSizePercent
Constructor Detail |
public MetaCost()
Method Detail |
public java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-W classname
Specify the full class name of a classifier (required).
-C cost file
File name of a cost matrix to use. If this is not supplied, a cost
matrix will be loaded on demand. The name of the on-demand file
is the relation name of the training data plus ".cost", and the
path to the on-demand file is specified with the -D option.
-D directory
Name of a directory to search for cost files when loading costs on demand
(default current directory).
-I num
Set the number of bagging iterations (default 10).
-S seed
Random number seed used when reweighting by resampling (default 1).
-P num
Size of each bag, as a percentage of the training size (default 100).
Options after -- are passed to the designated classifier.
setOptions
in interface OptionHandler
options
- the list of options as an array of stringsjava.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface OptionHandler
public SelectedTag getCostMatrixSource()
public void setCostMatrixSource(SelectedTag newMethod)
newMethod
- the cost matrix location method.public java.io.File getOnDemandDirectory()
public void setOnDemandDirectory(java.io.File newDir)
newDir
- The cost file search directory.public void setClassifier(Classifier classifier)
classifier
- the distribution classifier with all options set.public Classifier getClassifier()
protected java.lang.String getClassifierSpec()
public int getBagSizePercent()
public void setBagSizePercent(int newBagSizePercent)
newBagSizePercent
- the bag size, as a percentage.public void setNumIterations(int numIterations)
public int getNumIterations()
public CostMatrix getCostMatrix()
public void setCostMatrix(CostMatrix newCostMatrix)
the
- cost matrixpublic void setSeed(int seed)
seed
- the seed for resamplingpublic int getSeed()
public void buildClassifier(Instances data) throws java.lang.Exception
buildClassifier
in class Classifier
data
- the training datajava.lang.Exception
- if the classifier could not be built successfullypublic double classifyInstance(Instance instance) throws java.lang.Exception
classifyInstance
in class Classifier
instance
- the instance to be classifiedjava.lang.Exception
- if instance could not be classified
successfullypublic java.lang.String toString()
toString
in class java.lang.Object
public static void main(java.lang.String[] argv)
argv
- should contain the following arguments:
-t training file [-T test file] [-c class index]
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |