我需要一次性在不同的训练实例上多次在weka中构建过滤分类器。我发布了示例代码,以明确我的观点
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.filters.unsupervised.attribute.Remove;
...
Instances train = ... // from somewhere
Instances test = ... // from somewhere
// filter
Remove rm = new Remove();
rm.setAttributeIndices("1"); // remove 1st attribute
// classifier
J48 j48 = new J48();
j48.setUnpruned(true); // using an unpruned J48
// meta-classifier
FilteredClassifier fc = new FilteredClassifier();
fc.setFilter(rm);
fc.setClassifier(j48);
// train and make predictions
fc.buildClassifier(train);
for (int i = 0; i < test.numInstances(); i++) {
double pred = fc.classifyInstance(test.instance(i));
System.out.print("ID: " + test.instance(i).value(0));
System.out.print(", actual: " + test.classAttribute().value((int) test.instance(i).classValue()));
System.out.println(", predicted: " + test.classAttribute().value((int) pred));
}
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.filters.unsupervised.attribute.Remove;
...
Instances train = ... // from somewhere
Instances test = ... // from somewhere
// filter
Remove rm = new Remove();
rm.setAttributeIndices("1"); // remove 1st attribute
// classifier
J48 j48 = new J48();
j48.setUnpruned(true); // using an unpruned J48
// meta-classifier
FilteredClassifier fc = new FilteredClassifier();
fc.setFilter(rm);
fc.setClassifier(j48);
// train and make predictions
fc.buildClassifier(train);
for (int i = 0; i < test.numInstances(); i++) {
double pred = fc.classifyInstance(test.instance(i));
System.out.print("ID: " + test.instance(i).value(0));
System.out.print(", actual: " + test.classAttribute().value((int) test.instance(i).classValue()));
System.out.println(", predicted: " + test.classAttribute().value((int) pred));
}
在将数据打印到控制台之后的for循环中,我需要在另一个训练数据集上再次重建FilteredClassifier(fc)。我目前正在尝试这样做,但没有成功,好像我使用FilteredClassifier(fc)的相同实例或创建FilteredClassifier的新实例,Weka引发了NullPointerException。
我该怎么做我想做的事?如果FilteredClassifier创建一个线程以便在我使用另一个FilteredClassifier实例的情况下暂停其操作,我是否需要使用任何wait()或notify()操作?
这是由JVM引发的pintStack异常
java.lang.NullPointerException
at java.util.Hashtable.hash(Unknown Source)
at java.util.Hashtable.get(Unknown Source)
at weka.core.Attribute.addStringValue(Attribute.java:868)
at weka.core.StringLocator.copyStringValues(StringLocator.java:148)
at weka.core.StringLocator.copyStringValues(StringLocator.java:93)
at weka.filters.Filter.copyValues(Filter.java:364)
at weka.filters.Filter.bufferInput(Filter.java:301)
at weka.filters.unsupervised.attribute.StringToWordVector.input(StringToWordVector.java:697)
at weka.filters.Filter.useFilter(Filter.java:661)
at weka.classifiers.meta.FilteredClassifier.buildClassifier(FilteredClassifier.java:390)
我感谢任何帮助...
答案 0 :(得分:1)
首先,我不知道原因,但这可能很有用:我遇到完全并遇到相同的异常并解决了它。
我正在将两个数据集合并为一个更大的数据集。摘要
for (int i=0; i < datasetB.numInstances(); i++) {
Instance instance = datasetB.instance(i);
datasetA.add(instance);
}
datasetA
包含A + B
但是,当我尝试使用datasetA
时,就像
public MyResponse classify(String msg) {
...
// rebuild classififer and filter
Instances filteredData = Filter.useFilter(dataset, filter); //BREAKS
...
// classify
MyResponse response = classifier.classifyInstance(filteredInstance)
}
它说
java.lang.NullPointerException
at java.util.Hashtable.hash(Unknown Source)
at java.util.Hashtable.get(Unknown Source)
at weka.core.Attribute.addStringValue(Attribute.java:868)
at weka.core.StringLocator.copyStringValues(StringLocator.java:148)
at weka.core.StringLocator.copyStringValues(StringLocator.java:93)
at weka.filters.Filter.copyValues(Filter.java:364)
at weka.filters.Filter.bufferInput(Filter.java:301)
at weka.filters.unsupervised.attribute.StringToWordVector.input(StringToWordVector.java:697)
at weka.filters.Filter.useFilter(Filter.java:661)
解决方案是:在数据集B的实例中考虑好像它是一个新的。
如果构建新实例,则执行与
类似的操作// Msg: String, Class: String
private Instance makeInstance(String text, String classValue) {
Instance instance = new Instance(2); // two attributes
Attribute messageAttribute = data.attribute("Msg");
instance.setValue(messageAttribute, messageAttribute.addStringValue(text));
instance.setClassValue(classValue);
instance.setDataset(this.dataset);
return instance;
}
与datasetB
的实例相同private Instance makeInstance(Instance i) {
Instance instance = new Instance(2); // two attributes
Attribute messageAttribute = dataset.attribute("Msg");
instance.setValue(messageAttribute, messageAttribute.addStringValue(getMsg(i)));
instance.setDataset(this.dataset);
instance.setClassValue(getClassValue(i));
return instance
}
并在合并方法
中调用此方法for (int i=0; i < data.numInstances(); i++) {
Instance instance = data.instance(i);
Instance buildInstance = makeInstance(instance);
dataset.add(buildInstance);
}