我一直在将J48分类器用于具有1,000,000条记录和25,011条记录的训练数据的测试数据。我的输出中正确分类实例的数量非常少(3.4%)。我不确定如何改善这些数字。
我的数据如下:
3,9,2,6,4,11,4,12,2,4,Apple
4,1,4,10,3,13,3,4,1,10,Banana
2,1,2,10,4,4,4,1,4,13,Apple
2,12,4,3,1,10,1,12,4,9,Apple
1,7,3,11,3,3,4,8,3,7,Apple
最后一个值-香蕉,苹果是类索引。此属性共有9个选项。这是我在处理J48分类器的地方。
J48 j48 = new J48();
List<String> options = Collections.singletonList("-U");
j48.setOptions(options.toArray(new String[0]));
j48.buildClassifier(trainingData);
System.out.println("J48 Classifier\n");
System.out.println(j48);
Evaluation j48Evaluation = processEvaluation(trainingData);
j48Evaluation.evaluateModel(j48, testingData);
String j48Summary = processEvaluationSummary(j48Evaluation, J48_SUMMARY);
System.out.println(j48Summary);
String j48Matrix = processEvaluationMatrix(j48Evaluation, J48_MATRIX_SUMMARY);
System.out.println(j48Matrix);