weka.classifiers
Class MetaCost

java.lang.Object
  |
  +--weka.classifiers.Classifier
        |
        +--weka.classifiers.MetaCost
All Implemented Interfaces:
java.lang.Cloneable, OptionHandler, java.io.Serializable

public class MetaCost
extends Classifier
implements OptionHandler

This metaclassifier makes its base classifier cost-sensitive using the method specified in

Pedro Domingos (1999). MetaCost: A general method for making classifiers cost-sensitive, Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155-164. Also available online at http://www.cs.washington.edu/homes/pedrod/kdd99.ps.gz.

This classifier should produce similar results to one created by passing the base learner to Bagging, which is in turn passed to a CostSensitiveClassifier operating on minimum expected cost. The difference is that MetaCost produces a single cost-sensitive classifier of the base learner, giving the benefits of fast classification and interpretable output (if the base learner itself is interpretable). This implementation uses all bagging iterations when reclassifying training data (the MetaCost paper reports a marginal improvement when only those iterations containing each training instance are used in reclassifying that instance).

Valid options are:

-W classname
Specify the full class name of a classifier (required).

-C cost file
File name of a cost matrix to use. If this is not supplied, a cost matrix will be loaded on demand. The name of the on-demand file is the relation name of the training data plus ".cost", and the path to the on-demand file is specified with the -D option.

-D directory
Name of a directory to search for cost files when loading costs on demand (default current directory).

-I num
Set the number of bagging iterations (default 10).

-S seed
Random number seed used when reweighting by resampling (default 1).

-P num
Size of each bag, as a percentage of the training size (default 100).

Options after -- are passed to the designated classifier.

Author:
Len Trigg (len@intelligenesis.net)
See Also:
Serialized Form

Field Summary
protected  int m_BagSizePercent
          The size of each bag sample, as a percentage of the training size
protected  Classifier m_Classifier
          The classifier
protected  java.lang.String m_CostFile
          The name of the cost file, for command line options
protected  CostMatrix m_CostMatrix
          The cost matrix
protected  int m_MatrixSource
          Indicates the current cost matrix source
protected  int m_NumIterations
          The number of iterations.
protected  java.io.File m_OnDemandDirectory
          The directory used when loading cost files on demand, null indicates current directory
protected  int m_Seed
          Seed for reweighting using resampling.
static int MATRIX_ON_DEMAND
           
static int MATRIX_SUPPLIED
           
static Tag[] TAGS_MATRIX_SOURCE
           
 
Constructor Summary
MetaCost()
           
 
Method Summary
 void buildClassifier(Instances data)
          Builds the model of the base learner.
 double classifyInstance(Instance instance)
          Classifies a given test instance.
 int getBagSizePercent()
          Gets the size of each bag, as a percentage of the training set size.
 Classifier getClassifier()
          Gets the distribution classifier used.
protected  java.lang.String getClassifierSpec()
          Gets the classifier specification string, which contains the class name of the classifier and any options to the classifier
 CostMatrix getCostMatrix()
          Gets the misclassification cost matrix.
 SelectedTag getCostMatrixSource()
          Gets the source location method of the cost matrix.
 int getNumIterations()
          Gets the number of bagging iterations
 java.io.File getOnDemandDirectory()
          Returns the directory that will be searched for cost files when loading on demand.
 java.lang.String[] getOptions()
          Gets the current settings of the Classifier.
 int getSeed()
          Get seed for resampling.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options
static void main(java.lang.String[] argv)
          Main method for testing this class.
 void setBagSizePercent(int newBagSizePercent)
          Sets the size of each bag, as a percentage of the training set size.
 void setClassifier(Classifier classifier)
          Sets the distribution classifier
 void setCostMatrix(CostMatrix newCostMatrix)
          Sets the misclassification cost matrix.
 void setCostMatrixSource(SelectedTag newMethod)
          Sets the source location of the cost matrix.
 void setNumIterations(int numIterations)
          Sets the number of bagging iterations
 void setOnDemandDirectory(java.io.File newDir)
          Sets the directory that will be searched for cost files when loading on demand.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setSeed(int seed)
          Set seed for resampling.
 java.lang.String toString()
          Output a representation of this classifier
 
Methods inherited from class weka.classifiers.Classifier
forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

MATRIX_ON_DEMAND

public static final int MATRIX_ON_DEMAND

MATRIX_SUPPLIED

public static final int MATRIX_SUPPLIED

TAGS_MATRIX_SOURCE

public static final Tag[] TAGS_MATRIX_SOURCE

m_MatrixSource

protected int m_MatrixSource
Indicates the current cost matrix source

m_OnDemandDirectory

protected java.io.File m_OnDemandDirectory
The directory used when loading cost files on demand, null indicates current directory

m_CostFile

protected java.lang.String m_CostFile
The name of the cost file, for command line options

m_Classifier

protected Classifier m_Classifier
The classifier

m_CostMatrix

protected CostMatrix m_CostMatrix
The cost matrix

m_NumIterations

protected int m_NumIterations
The number of iterations.

m_Seed

protected int m_Seed
Seed for reweighting using resampling.

m_BagSizePercent

protected int m_BagSizePercent
The size of each bag sample, as a percentage of the training size
Constructor Detail

MetaCost

public MetaCost()
Method Detail

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options
Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-W classname
Specify the full class name of a classifier (required).

-C cost file
File name of a cost matrix to use. If this is not supplied, a cost matrix will be loaded on demand. The name of the on-demand file is the relation name of the training data plus ".cost", and the path to the on-demand file is specified with the -D option.

-D directory
Name of a directory to search for cost files when loading costs on demand (default current directory).

-I num
Set the number of bagging iterations (default 10).

-S seed
Random number seed used when reweighting by resampling (default 1).

-P num
Size of each bag, as a percentage of the training size (default 100).

Options after -- are passed to the designated classifier.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Classifier.
Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

getCostMatrixSource

public SelectedTag getCostMatrixSource()
Gets the source location method of the cost matrix. Will be one of MATRIX_ON_DEMAND or MATRIX_SUPPLIED.
Returns:
the cost matrix source.

setCostMatrixSource

public void setCostMatrixSource(SelectedTag newMethod)
Sets the source location of the cost matrix. Values other than MATRIX_ON_DEMAND or MATRIX_SUPPLIED will be ignored.
Parameters:
newMethod - the cost matrix location method.

getOnDemandDirectory

public java.io.File getOnDemandDirectory()
Returns the directory that will be searched for cost files when loading on demand.
Returns:
The cost file search directory.

setOnDemandDirectory

public void setOnDemandDirectory(java.io.File newDir)
Sets the directory that will be searched for cost files when loading on demand.
Parameters:
newDir - The cost file search directory.

setClassifier

public void setClassifier(Classifier classifier)
Sets the distribution classifier
Parameters:
classifier - the distribution classifier with all options set.

getClassifier

public Classifier getClassifier()
Gets the distribution classifier used.
Returns:
the classifier

getClassifierSpec

protected java.lang.String getClassifierSpec()
Gets the classifier specification string, which contains the class name of the classifier and any options to the classifier
Returns:
the classifier string.

getBagSizePercent

public int getBagSizePercent()
Gets the size of each bag, as a percentage of the training set size.
Returns:
the bag size, as a percentage.

setBagSizePercent

public void setBagSizePercent(int newBagSizePercent)
Sets the size of each bag, as a percentage of the training set size.
Parameters:
newBagSizePercent - the bag size, as a percentage.

setNumIterations

public void setNumIterations(int numIterations)
Sets the number of bagging iterations

getNumIterations

public int getNumIterations()
Gets the number of bagging iterations
Returns:
the maximum number of bagging iterations

getCostMatrix

public CostMatrix getCostMatrix()
Gets the misclassification cost matrix.
Returns:
the cost matrix

setCostMatrix

public void setCostMatrix(CostMatrix newCostMatrix)
Sets the misclassification cost matrix.
Parameters:
the - cost matrix

setSeed

public void setSeed(int seed)
Set seed for resampling.
Parameters:
seed - the seed for resampling

getSeed

public int getSeed()
Get seed for resampling.
Returns:
the seed for resampling

buildClassifier

public void buildClassifier(Instances data)
                     throws java.lang.Exception
Builds the model of the base learner.
Overrides:
buildClassifier in class Classifier
Parameters:
data - the training data
Throws:
java.lang.Exception - if the classifier could not be built successfully

classifyInstance

public double classifyInstance(Instance instance)
                        throws java.lang.Exception
Classifies a given test instance.
Overrides:
classifyInstance in class Classifier
Parameters:
instance - the instance to be classified
Throws:
java.lang.Exception - if instance could not be classified successfully

toString

public java.lang.String toString()
Output a representation of this classifier
Overrides:
toString in class java.lang.Object

main

public static void main(java.lang.String[] argv)
Main method for testing this class.
Parameters:
argv - should contain the following arguments: -t training file [-T test file] [-c class index]