Stringtoword矢量不能正常工作weka

时间:2014-03-29 02:45:53

标签: java filter weka bayesian

我正在使用字符串向量过滤器将我的arff转换为矢量格式。

但它抛出异常

weka.core.WekaException: weka.classifiers.bayes.NaiveBayesMultinomialUpdateable: Not enough training instances with class labels (required: 1, provided: 0)!

我尝试在weka explorer上使用相同的功能,但它运行良好。

这是我的代码

  ArffLoader loader = new ArffLoader();
    loader.setFile(new File("valid file"));
    Instances structure = loader.getStructure();
    structure.setClassIndex(0);

    // train NaiveBayes
    NaiveBayesMultinomialUpdateable n = new NaiveBayesMultinomialUpdateable();
    FilteredClassifier f = new FilteredClassifier();
    StringToWordVector s = new StringToWordVector();

    f.setFilter(s);
    f.setClassifier(n);

    f.buildClassifier(structure);
    Instance current;
    while ((current = loader.getNextInstance(structure)) != null)
      n.updateClassifier(current);

    // output generated model
    System.out.println(n);

我尝试了另一个例子,但它仍然不起作用

  ArffLoader loader = new ArffLoader();
    loader.setFile(new File("valid file"));

    Instances structure = loader.getStructure();


    // train NaiveBayes
    NaiveBayesMultinomialUpdateable n = new NaiveBayesMultinomialUpdateable();
    FilteredClassifier f = new FilteredClassifier();
    StringToWordVector s = new StringToWordVector();
    s.setInputFormat(structure);
    Instances struct = Filter.useFilter(structure, s);

    struct.setClassIndex(0);
    System.out.println(struct.numAttributes()); // only gives 2 or 1 attributes 




    n.buildClassifier(struct);
    Instance current;
    while ((current = loader.getNextInstance(struct)) != null)
      n.updateClassifier(current);

    // output generated model
    System.out.println(n);

打印的属性数始终为2或1.

字符串向量字符串似乎没有按预期工作

原始文件夹:https://www.dropbox.com/sh/cma4hbe2r96ul1c/GL2wNdeVUz

转换为arff:https://www.dropbox.com/s/efle6ci4lb5riq7/test1.arff

1 个答案:

答案 0 :(得分:1)

根据你的arff,这个类似乎是两个属性中的第二个,所以问题可以在这里:

struct.setClassIndex(0);

struct.setClassIndex(1);

更新:我对第一个示例进行了此更改,它没有任何异常,并打印出来:

The independent probability of a class
--------------------------------------
oil spill   40.0
police  989.0

The probability of a word given the class
-----------------------------------------
        oil spill   police  
class   Infinity    Infinity