weka.classifiers
Class CostMatrix

java.lang.Object
  |
  +--weka.core.Matrix
        |
        +--weka.classifiers.CostMatrix
All Implemented Interfaces:
java.lang.Cloneable, java.io.Serializable

public class CostMatrix
extends Matrix

Class for a misclassification cost matrix. The element in the i'th column of the j'th row is the cost for (mis)classifying an instance of class j as having class i. It is valid to have non-zero values down the diagonal (these are typically negative to indicate some varying degree of "gain" from making a correct prediction).

Author:
Len Trigg (len@intelligenesis.net)
See Also:
Serialized Form

Field Summary
static java.lang.String FILE_EXTENSION
          The filename extension that should be used for cost files
 
Fields inherited from class weka.core.Matrix
m_Elements
 
Constructor Summary
CostMatrix(CostMatrix toCopy)
          Creates a cost matrix identical to an existing matrix.
CostMatrix(int numClasses)
          Creates a default cost matrix for the given number of classes.
CostMatrix(java.io.Reader r)
          Creates a cost matrix from a cost file.
 
Method Summary
 Instances applyCostMatrix(Instances instances, java.util.Random random)
          Changes the dataset to reflect a given set of costs.
 double[] expectedCosts(double[] probabilities)
          Calculates the expected misclassification cost for each possible class value, given class probability estimates.
 double getMaxCost(int actualClass)
          Gets the maximum misclassification cost possible for a given actual class value
 void initialize()
          Sets the costs to default values (i.e.
static void main(java.lang.String[] args)
          Tests out creation of a frequency dependent cost matrix from the command line.
static CostMatrix makeFrequencyDependentMatrix(Instances instances, double weight)
          Creates a cost matrix for the class attribute of the supplied instances, where the misclassification costs are higher for misclassifying a rare class as a frequent one.
 void normalize()
          Normalizes the cost matrix so that diagonal elements are zero.
 void readOldFormat(java.io.Reader reader)
          Reads misclassification cost matrix from given reader.
 int size()
          Gets the number of classes.
 
Methods inherited from class weka.core.Matrix
add, addElement, clone, getElement, lubksb, ludcmp, multiply, numColumns, numRows, regression, regression, setColumn, setElement, setRow, toString, transpose, write
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

FILE_EXTENSION

public static java.lang.String FILE_EXTENSION
The filename extension that should be used for cost files
Constructor Detail

CostMatrix

public CostMatrix(CostMatrix toCopy)
Creates a cost matrix identical to an existing matrix.
Parameters:
toCopy - the matrix to copy.

CostMatrix

public CostMatrix(int numClasses)
Creates a default cost matrix for the given number of classes. The default misclassification cost is 1.
Parameters:
numClasses - the number of classes

CostMatrix

public CostMatrix(java.io.Reader r)
           throws java.lang.Exception
Creates a cost matrix from a cost file.
Parameters:
r - a reader from which the cost matrix will be read
Throws:
java.lang.Exception - if an error occurs
Method Detail

makeFrequencyDependentMatrix

public static CostMatrix makeFrequencyDependentMatrix(Instances instances,
                                                      double weight)
                                               throws java.lang.Exception
Creates a cost matrix for the class attribute of the supplied instances, where the misclassification costs are higher for misclassifying a rare class as a frequent one. The cost of classifying an instance of class i as class j is weight * Pj / Pi. (Pi and Pj are laplace estimates)
Parameters:
instances - a value of type 'Instances'
weight - a value of type 'double'
Returns:
a value of type CostMatrix
Throws:
java.lang.Exception - if no class attribute is assigned, or the class attribute is not nominal

readOldFormat

public void readOldFormat(java.io.Reader reader)
                   throws java.lang.Exception
Reads misclassification cost matrix from given reader. Each line has to contain three numbers: the index of the true class, the index of the incorrectly assigned class, and the weight, separated by white space characters. Comments can be appended to the end of a line by using the '%' character.
Parameters:
reader - the reader from which the cost matrix is to be read
Throws:
java.lang.Exception - if the cost matrix does not have the right format

initialize

public void initialize()
Sets the costs to default values (i.e. 0 down the diagonal, and 1 for any misclassification).
Overrides:
initialize in class Matrix

size

public int size()
Gets the number of classes.
Returns:
the number of classes

normalize

public void normalize()
Normalizes the cost matrix so that diagonal elements are zero. The value of non-zero diagonal elements is subtracted from the row containing the value. For example:


 2  5
 3 -1
 

becomes


 0  3
 4  0
 

This normalization will affect total classification cost during evaluation, but will not affect the decision made by applying minimum expected cost criteria during prediction.


applyCostMatrix

public Instances applyCostMatrix(Instances instances,
                                 java.util.Random random)
                          throws java.lang.Exception
Changes the dataset to reflect a given set of costs. Sets the weights of instances according to the misclassification cost matrix, or does resampling according to the cost matrix (if a random number generator is provided). Returns a new dataset.
Parameters:
instances - the instances to apply cost weights to.
random - a random number generator
Returns:
the new dataset
Throws:
java.lang.Exception - if the cost matrix does not have the right format

expectedCosts

public double[] expectedCosts(double[] probabilities)
                       throws java.lang.Exception
Calculates the expected misclassification cost for each possible class value, given class probability estimates.
Parameters:
probabilities - an array containing probability estimates for each class value.
Returns:
an array containing the expected misclassification cost for each class.
Throws:
java.lang.Exception - if the number of probabilities does not match the number of classes.

getMaxCost

public double getMaxCost(int actualClass)
Gets the maximum misclassification cost possible for a given actual class value
Parameters:
actualClass - the index of the actual class value
Returns:
the highest cost possible for misclassifying this class

main

public static void main(java.lang.String[] args)
Tests out creation of a frequency dependent cost matrix from the command line. Either pipe a set of instances into system.in or give the name of a dataset as an argument. The last column will be treated as the class attribute and a cost matrix with weight 1000 output.
Parameters:
[]args - a value of type 'String'