Question

我是Weka的新手。我想在WEKA中使用顺序最小优化。谁能告诉我怎么办？这是我的Java代码，但它不起作用：

public class SVMTest {
public void test(File input) throws Exception{
File tmp = new File("tmp-file-duplicate-pairs.arff");
String path = input.getParent();
//tmp.deleteOnExit();
////removeFeatures(input,tmp,useType,useNames, useActivities, useOccupation,useFriends,useMailAndSite,useLocations);
Instances data = new weka.core.converters.ConverterUtils.DataSource(tmp.getAbsolutePath()).getDataSet();
data.setClassIndex(data.numAttributes() - 1);
Classifier c = null;        
String ctype = null;
boolean newmodel = false;

ctype ="SMO";
c = new SMO();
String[] options = {"-M"};
c.setOptions(options);
c.buildClassifier(data);
newmodel = true;
//c = loadClassifier(input.getParentFile().getParentFile(),ctype);
if(newmodel)
    saveModel(c,ctype, input.getParentFile().getParentFile());
Evaluation eval = new Evaluation(data);
eval.crossValidateModel(c, data, 10, new Random(1));

System.out.println(c);
System.out.println(eval.toSummaryString());
System.out.println(eval.toClassDetailsString());
System.out.println(eval.toMatrixString());

tmp.delete();
}
 private static void saveModel(Classifier c, String name, File path) throws Exception {

ObjectOutputStream oos = null;
try {
    oos = new ObjectOutputStream(
            new FileOutputStream(path.getAbsolutePath()+"/"+name+".model"));
} catch (FileNotFoundException e1) {
    e1.printStackTrace();
} catch (IOException e1) {
    e1.printStackTrace();
}
oos.writeObject(c);
oos.flush();
oos.close();

 }
}

我想知道如何提供.arff文件？我的数据集采用XML文件的形式。

Answer 1

我想你现在已经弄清楚了，但是如果它有助于其他人，那么有一个关于它的维基页面：

http://weka.wikispaces.com/Text+categorization+with+WEKA

使用SMO，假设您有一些列车实例“trainset”，并且测试集“testset” 构建分类器：

            // train SMO and output model
            SMO classifier = new SMO();
            classifier.buildClassifier(trainset);

使用交叉验证来评估它，例如：

    Evaluation eval = new Evaluation(testset);
    Random rand = new Random(1); // using seed = 1
    int folds = 10;
    eval.crossValidateModel(classifier, testset, folds, rand);

然后eval保存所有统计数据等。

Answer 2

您可以从以下行读取输入文件：

Instances training_data = new Instances(new BufferedReader(
        new FileReader("tmp-file-duplicate-pairs.arff")));
training_data.setClassIndex(training_data.numAttributes() - 1);

Answer 3

以下链接介绍了如何在weka中使用SMO http://preciselyconcise.com/apis_and_installations/training_a_weka_classifier_in_java.php

SMO，WEKA中的顺序最小优化

3 个答案: