我正在使用weka java API对我的几个实例进行分类,我提供的weka文件的文件如下:
0.3,0.1,1
0.0,0.04,0
0.0,0.03,1
并且所有上述实例都分配了唯一的id,例如第一行的id为1098 ... 我编写了以下代码,它使用weka java API对结果进行分类并返回那些分类不正确的实例:
public static void SVM(ArrayList<String[]> testData) throws FileNotFoundException, IOException,
Exception {
BufferedReader breader = null; breader = new BufferedReader(new FileReader(“weka / train.txt”));
Instances train = new Instances(breader);
train.setClassIndex(train.numAttributes() - 1);
Instances unlabeled = new Instances(new BufferedReader(new FileReader(
"weka/test.txt")));
breader.close();
// set class attribute
unlabeled.setClassIndex(unlabeled.numAttributes() - 1);
// create copy
Instances labeled = new Instances(unlabeled);
LibSVM svm = new LibSVM();
svm.buildClassifier(train);
Evaluation eval = new Evaluation(train);
BufferedWriter writer = new BufferedWriter(new FileWriter(
"weka/labeledSVM.txt"));
for (int i = 0; i < unlabeled.numInstances(); i++) {
double clsLabel = svm.classifyInstance(unlabeled.instance(i));
if(unlabeled.instance(i).value(5)!=clsLabel){
writer.write("the unique id is: "+testData.get(i)[0] + " real label of the text is : "+ unlabeled.instance(i).toString() + ", According to Algorithm reult label is: " + clsLabel);
writer.newLine();
}
writer.flush();
writer.close();
}
但是一个很大的问题是,唯一ID和算法标记的实例之间的映射是不正确的,所以我想知道是否有任何方法可以在我拥有的实例中包含每个文本的唯一ID告诉weka分类器忽略它?
例如:
1980,0.3,0.1,1
1981,0.0,0.04,0
1982,0.0,0.03,0
或任何其他建议表示赞赏
答案 0 :(得分:0)
我发现这样做的唯一方法是创建我自己的Instance子类。
答案 1 :(得分:0)
使用&#34; AddID&#34; filter将为每个实例分配唯一ID,然后使用FilteredClassifier,即weka.classifiers.meta.FilteredClassifier。