我想用花木兰来分类一些数据。但我得到一个例外:
mulan.data.DataLoadException: Error creating Instances data from supplied Reader data source
at mulan.data.MultiLabelInstances.loadInstances(MultiLabelInstances.java:469)
at mulan.data.MultiLabelInstances.loadInstances(MultiLabelInstances.java:458)
at mulan.data.MultiLabelInstances.<init>(MultiLabelInstances.java:168)
主要功能来自mulan.examples.TrainTestExperiment
public class TrainTestExperiment {
public static void main(String[] args) {
try {
String path = Utils.getOption("path", args); // e.g. -path dataset/
String filestem = Utils.getOption("filestem", args); // e.g. -filestem emotions
String percentage = Utils.getOption("percentage", args); // e.g. -percentage 50 (for 50%)
System.out.println("Loading the dataset");
MultiLabelInstances mlDataSet = new MultiLabelInstances(path + filestem + ".arff", path + filestem + ".xml");
// split the data set into train and test
Instances dataSet = mlDataSet.getDataSet();
RemovePercentage rmvp = new RemovePercentage();
rmvp.setInvertSelection(true);
rmvp.setPercentage(Double.parseDouble(percentage));
rmvp.setInputFormat(dataSet);
Instances trainDataSet = Filter.useFilter(dataSet, rmvp);
rmvp = new RemovePercentage();
rmvp.setPercentage(Double.parseDouble(percentage));
rmvp.setInputFormat(dataSet);
Instances testDataSet = Filter.useFilter(dataSet, rmvp);
MultiLabelInstances train = new MultiLabelInstances(trainDataSet, path + filestem + ".xml");
MultiLabelInstances test = new MultiLabelInstances(testDataSet, path + filestem + ".xml");
Evaluator eval = new Evaluator();
Evaluation results;
Classifier brClassifier = new NaiveBayes();
BinaryRelevance br = new BinaryRelevance(brClassifier);
br.setDebug(true);
br.build(train);
results = eval.evaluate(br, test);
System.out.println(results);
} catch (Exception e) {
e.printStackTrace();
}
}
}
至于数据格式,我有一个名为title的维度,有160个catagories。
根据arff格式格式化数据文件。
有些文字是中文的。
任何帮助都表示赞赏。
最好的问候