Question

假设我已经构建了一个模型（例如J4.8树）并使用交叉验证对其进行了评估。我如何使用此模型对新数据集进行分类？我知道，我可以设置一个带有数据的文件，用“提供的测试集”选项进行分类，在“更多选项”窗口中标记“输出预测”并再次运行分类。它几乎可以产生我需要的东西，但它似乎是一个非常奇怪的工作流程。此外，它重新创建了所有模型，这可能会花费不必要的时间。是否有更简单的方法来对已经建立的模型进行分类？

Answer 1

有几种方法可以解决这个问题。

第一个

您可以使用命令行来保存和加载模型，-l和-d命令行开关允许您执行此操作。

来自weka docs

-l 
    Sets model input file. In case the filename ends with '.xml',
    a PMML file is loaded or, if that fails, options are loaded
    from the XML file.
-d 
    Sets model output file. In case the filename ends with '.xml',
    only the options are saved to the XML file, not the model.

第二个

在生成模型后，使用第二次单击来保存并加载模型。请参阅 following image

第三个

您也可以为分类器生成java代码。这样您就可以保存分类器并重新使用它。按照以下步骤操作。

单击“更多选项”按钮。
从打开的开始，选择输出源代码。
给分类器名称更有意义的名称。

这些步骤将为您的j48分类器输出java类。下面的类WekaJ48ForIris由weka创建，用于Iris数据集。您可能需要重构一些以使其更有用。

class WekaJ48ForIris {

  public static double classify(Object[] i)
    throws Exception {

    double p = Double.NaN;
    p = WekaJ48ForIris.N26a305890(i);
    return p;
  }
  static double N26a305890(Object []i) {
    double p = Double.NaN;
    if (i[3] == null) {
      p = 0;
    } else if (((Double) i[3]).doubleValue() <= 0.6) {
      p = 0;
    } else if (((Double) i[3]).doubleValue() > 0.6) {
    p = WekaJ48ForIris.N18c079301(i);
    } 
    return p;
  }
  static double N18c079301(Object []i) {
    double p = Double.NaN;
    if (i[3] == null) {
      p = 1;
    } else if (((Double) i[3]).doubleValue() <= 1.7) {
    p = WekaJ48ForIris.N4544b022(i);
    } else if (((Double) i[3]).doubleValue() > 1.7) {
      p = 2;
    } 
    return p;
  }
  static double N4544b022(Object []i) {
    double p = Double.NaN;
    if (i[2] == null) {
      p = 1;
    } else if (((Double) i[2]).doubleValue() <= 4.9) {
      p = 1;
    } else if (((Double) i[2]).doubleValue() > 4.9) {
    p = WekaJ48ForIris.N3a0872863(i);
    } 
    return p;
  }
  static double N3a0872863(Object []i) {
    double p = Double.NaN;
    if (i[3] == null) {
      p = 2;
    } else if (((Double) i[3]).doubleValue() <= 1.5) {
      p = 2;
    } else if (((Double) i[3]).doubleValue() > 1.5) {
      p = 1;
    } 
    return p;
  }
}

Answer 2

misc包中有特殊的类SerializedClassifier，它以模型文件为参数，具有模拟训练阶段。

如何在Weka的资源管理器中应用分类器？

2 个答案: