如何多次重新构建FilteredClassifier?

时间:2013-12-28 09:42:31

标签: java weka

我需要一次性在不同的训练实例上多次在weka中构建过滤分类器。我发布了示例代码,以明确我的观点

import weka.classifiers.meta.FilteredClassifier;
 import weka.classifiers.trees.J48;
 import weka.filters.unsupervised.attribute.Remove;
 ...
 Instances train = ...         // from somewhere
 Instances test = ...          // from somewhere
 // filter
 Remove rm = new Remove();
 rm.setAttributeIndices("1");  // remove 1st attribute
 // classifier
 J48 j48 = new J48();
 j48.setUnpruned(true);        // using an unpruned J48
 // meta-classifier
 FilteredClassifier fc = new FilteredClassifier();
 fc.setFilter(rm);
 fc.setClassifier(j48);
 // train and make predictions
 fc.buildClassifier(train);
 for (int i = 0; i < test.numInstances(); i++) {
   double pred = fc.classifyInstance(test.instance(i));
   System.out.print("ID: " + test.instance(i).value(0));
   System.out.print(", actual: " + test.classAttribute().value((int) test.instance(i).classValue()));
   System.out.println(", predicted: " + test.classAttribute().value((int) pred));
 }

import weka.classifiers.meta.FilteredClassifier; import weka.classifiers.trees.J48; import weka.filters.unsupervised.attribute.Remove; ... Instances train = ... // from somewhere Instances test = ... // from somewhere // filter Remove rm = new Remove(); rm.setAttributeIndices("1"); // remove 1st attribute // classifier J48 j48 = new J48(); j48.setUnpruned(true); // using an unpruned J48 // meta-classifier FilteredClassifier fc = new FilteredClassifier(); fc.setFilter(rm); fc.setClassifier(j48); // train and make predictions fc.buildClassifier(train); for (int i = 0; i < test.numInstances(); i++) { double pred = fc.classifyInstance(test.instance(i)); System.out.print("ID: " + test.instance(i).value(0)); System.out.print(", actual: " + test.classAttribute().value((int) test.instance(i).classValue())); System.out.println(", predicted: " + test.classAttribute().value((int) pred)); }

在将数据打印到控制台之后的for循环中,我需要在另一个训练数据集上再次重建FilteredClassifier(fc)。我目前正在尝试这样做,但没有成功,好像我使用FilteredClassifier(fc)的相同实例或创建FilteredClassifier的新实例,Weka引发了NullPointerException。

我该怎么做我想做的事?如果FilteredClassifier创建一个线程以便在我使用另一个FilteredClassifier实例的情况下暂停其操作,我是否需要使用任何wait()或notify()操作?

这是由JVM引发的pintStack异常

java.lang.NullPointerException
    at java.util.Hashtable.hash(Unknown Source)
    at java.util.Hashtable.get(Unknown Source)
    at weka.core.Attribute.addStringValue(Attribute.java:868)
    at weka.core.StringLocator.copyStringValues(StringLocator.java:148)
    at weka.core.StringLocator.copyStringValues(StringLocator.java:93)
    at weka.filters.Filter.copyValues(Filter.java:364)
    at weka.filters.Filter.bufferInput(Filter.java:301)
    at weka.filters.unsupervised.attribute.StringToWordVector.input(StringToWordVector.java:697)
    at weka.filters.Filter.useFilter(Filter.java:661)
    at weka.classifiers.meta.FilteredClassifier.buildClassifier(FilteredClassifier.java:390)

我感谢任何帮助...

1 个答案:

答案 0 :(得分:1)

首先,我不知道原因,但这可能很有用:我遇到完全并遇到相同的异常并解决了它。

我正在将两个数据集合并为一个更大的数据集。摘要

for (int i=0; i < datasetB.numInstances(); i++) { Instance instance = datasetB.instance(i); datasetA.add(instance); }

datasetA包含A + B

但是,当我尝试使用datasetA时,就像

一样
public MyResponse classify(String msg) {
    ...

    // rebuild classififer and filter 
    Instances filteredData = Filter.useFilter(dataset, filter); //BREAKS
    ...

    // classify
    MyResponse response = classifier.classifyInstance(filteredInstance)
}

它说

java.lang.NullPointerException
at java.util.Hashtable.hash(Unknown Source)
at java.util.Hashtable.get(Unknown Source)
at weka.core.Attribute.addStringValue(Attribute.java:868)
at weka.core.StringLocator.copyStringValues(StringLocator.java:148)
at weka.core.StringLocator.copyStringValues(StringLocator.java:93)
at weka.filters.Filter.copyValues(Filter.java:364)
at weka.filters.Filter.bufferInput(Filter.java:301)
at weka.filters.unsupervised.attribute.StringToWordVector.input(StringToWordVector.java:697)
at weka.filters.Filter.useFilter(Filter.java:661)

解决方案是:在数据集B的实例中考虑好像它是一个新的。

如果构建新实例,则执行与

类似的操作
// Msg: String, Class: String
private Instance makeInstance(String text, String classValue) {
  Instance instance = new Instance(2); // two attributes
  Attribute messageAttribute = data.attribute("Msg");
  instance.setValue(messageAttribute, messageAttribute.addStringValue(text));
  instance.setClassValue(classValue);
  instance.setDataset(this.dataset);
  return instance;
}

与datasetB

的实例相同
private Instance makeInstance(Instance i) {
    Instance instance = new Instance(2); // two attributes
Attribute messageAttribute = dataset.attribute("Msg");
    instance.setValue(messageAttribute, messageAttribute.addStringValue(getMsg(i)));
instance.setDataset(this.dataset);
instance.setClassValue(getClassValue(i));
    return instance
}

并在合并方法

中调用此方法
for (int i=0; i < data.numInstances(); i++) {
Instance instance = data.instance(i);
Instance buildInstance = makeInstance(instance);
dataset.add(buildInstance);
 }