错误的是,我在其他地方使用了CostSensitiveAnalysis,每个都有不同的权重矩阵。一个是
(matrix 1a)
0 3
1 0
另一个是
(matrix 1b)
0 1
3 0
并且取得了非常好的成绩。当我发现错误时,我将体重矩阵改为:
(matrix 2)
0 1
1 0
但无法获得与以前相同的结果。我也试过
(matrix 3)
0 3
3 0
我认为,通过使用成本敏感性分析而不是其他成本敏感性分析,使用矩阵1a和1b,我会得到与矩阵2甚至3相同的结果,但结果却非常不同。
成本敏感度分析是否以使用指定权重的其他方式更改成本值?
由于
-
我编写了一个测试单元用于说明,它应该适用于任何具有class属性作为最后一个的数据集,并且有两个类。
import static org.junit.Assert.*;
import java.io.IOException;
import org.apache.log4j.Logger;
import org.apache.log4j.lf5.util.Resource;
import org.junit.Test;
import weka.classifiers.Classifier;
import weka.classifiers.Evaluation;
import weka.classifiers.meta.CostSensitiveClassifier;
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.core.Instances;
public class WekaFacadeTest {
private Logger logger = Logger.getLogger(WekaFacadeTest.class);
private CostMatrix createCostMatrix(double weightFalsePositive, double weightFalseNegative) {
CostMatrix costMatrix = new CostMatrix(2);
costMatrix.setCell(0, 0, 0.0);
costMatrix.setCell(1, 0, weightFalsePositive);
costMatrix.setCell(0, 1, weightFalseNegative);
costMatrix.setCell(1, 1, 0.0);
return costMatrix;
}
@Test
public void testDoubleCost() throws Exception {
Instances data = WekaFacade.loadArff("test.arff");
data.setClassIndex(data.numAttributes()-1);
// c1 => cost sensitive classifier applied to j48, cost = 1.0
CostSensitiveClassifier c1 = new CostSensitiveClassifier();
c1.setClassifier(new J48());
c1.setCostMatrix( createCostMatrix(1.0, 1.0));
c1.buildClassifier(data);
Evaluation ec1 = new Evaluation(data,c1.getCostMatrix());
ec1.evaluateModel(c1, data);
// c2 => no cost sensitive classifier, straight j48
J48 c2 = new J48();
c2.buildClassifier(data);
Evaluation ec2 = new Evaluation(data);
ec2.evaluateModel(c2, data);
// should c1 errorRate be equal to c2?
logger.info(String.format("Cost ec1=%f, ec2=%f",ec1.errorRate(),ec2.errorRate()));
assertEquals(ec1.errorRate(),ec2.errorRate(),0.0001);
// success!
// c3 => cost sensitive classifier applied to cost sensitive classifier applied to j48, cost = 1.0
CostSensitiveClassifier c3 = new CostSensitiveClassifier();
c3.setClassifier(new CostSensitiveClassifier());
((CostSensitiveClassifier)c3.getClassifier()).setClassifier(new J48());
c3.setCostMatrix( WekaFacade.createCostMatrix(1.0, 1.0));
((CostSensitiveClassifier)c3.getClassifier()).setCostMatrix( WekaFacade.createCostMatrix(1.0, 1.0));
c3.buildClassifier(data);
Evaluation ec3 = new Evaluation(data,c1.getCostMatrix());
ec3.evaluateModel(c3, data);
logger.info(String.format("Cost c3=%f, c1=%f",ec3.avgCost(),ec1.avgCost()));
assertEquals(ec3.avgCost(),ec1.avgCost(),0.0001);
// fail!
logger.info(String.format("ErrorRate c3=%f, c2=%f",ec3.errorRate(),ec2.errorRate()));
assertEquals(ec3.errorRate(),ec2.errorRate(),0.0001);
// fail!
// d => cost sensitive classifier applied to j48, normal situation
CostSensitiveClassifier d = new CostSensitiveClassifier();
d.setClassifier(new J48());
d.setCostMatrix( createCostMatrix(3.0, 1.0));
d.buildClassifier(data);
Evaluation ed = new Evaluation(data,d.getCostMatrix());
ed.evaluateModel(d, data);
// c => cost sensitive classifier applied to another cost sensitive classifier, abnormal situation
CostSensitiveClassifier c = new CostSensitiveClassifier();
c.setClassifier(new CostSensitiveClassifier());
((CostSensitiveClassifier)c.getClassifier()).setClassifier(new J48());
c.setCostMatrix( createCostMatrix(1.0, 1.0));
((CostSensitiveClassifier)c.getClassifier()).setCostMatrix( createCostMatrix(3.0, 1.0));
c.buildClassifier(data);
Evaluation ec = new Evaluation(data, c.getCostMatrix());
ec.evaluateModel(c, data );
// should ec average cost be the same as ed's ?
logger.info(String.format("Cost c=%f, d=%f",ec.avgCost(),ed.avgCost()));
assertEquals(ec.avgCost(),ed.avgCost(),0.0001);
// fails!
}
}
答案 0 :(得分:0)
到目前为止我发现了
CostSensitiveClassifier有两种操作模式:它可以在样本上设置显式权重(通过使用.weight()方法),也可以使用替换重新取样。在我的特定情况下,它使用的是最后一种方法。
因此,上述类排列将重新采样原始样本的两倍。重新采样随机过程,结果不应等于单个重采样。