花木兰对数据进行分类 - 不工作

时间:2012-12-29 08:18:30

标签: java machine-learning mulan

我想用花木兰来分类一些数据。但我得到一个例外:

mulan.data.DataLoadException: Error creating Instances data from supplied Reader data source
at mulan.data.MultiLabelInstances.loadInstances(MultiLabelInstances.java:469)
at mulan.data.MultiLabelInstances.loadInstances(MultiLabelInstances.java:458)
at mulan.data.MultiLabelInstances.<init>(MultiLabelInstances.java:168)

主要功能来自mulan.examples.TrainTestExperiment

public class TrainTestExperiment {

    public static void main(String[] args) {
        try {
            String path = Utils.getOption("path", args); // e.g. -path dataset/
            String filestem = Utils.getOption("filestem", args); // e.g. -filestem emotions
            String percentage = Utils.getOption("percentage", args); // e.g. -percentage 50 (for 50%)

            System.out.println("Loading the dataset");
            MultiLabelInstances mlDataSet = new MultiLabelInstances(path + filestem + ".arff", path + filestem + ".xml");

            // split the data set into train and test
            Instances dataSet = mlDataSet.getDataSet();
            RemovePercentage rmvp = new RemovePercentage();
            rmvp.setInvertSelection(true);
            rmvp.setPercentage(Double.parseDouble(percentage));
            rmvp.setInputFormat(dataSet);
            Instances trainDataSet = Filter.useFilter(dataSet, rmvp);

            rmvp = new RemovePercentage();
            rmvp.setPercentage(Double.parseDouble(percentage));
            rmvp.setInputFormat(dataSet);
            Instances testDataSet = Filter.useFilter(dataSet, rmvp);

            MultiLabelInstances train = new MultiLabelInstances(trainDataSet, path + filestem + ".xml");
            MultiLabelInstances test = new MultiLabelInstances(testDataSet, path + filestem + ".xml");

            Evaluator eval = new Evaluator();
            Evaluation results;

            Classifier brClassifier = new NaiveBayes();
            BinaryRelevance br = new BinaryRelevance(brClassifier);
            br.setDebug(true);
            br.build(train);
            results = eval.evaluate(br, test);
            System.out.println(results);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

至于数据格式,我有一个名为title的维度,有160个catagories。

根据arff格式格式化数据文件。

有些文字是中文的。

任何帮助都表示赞赏。

最好的问候

1 个答案:

答案 0 :(得分:0)

这看起来像木兰的一个错误。

Check out here了解错误的更多细节。