在Weka中使用预先计算的模型进行文本分类

时间:2013-08-01 22:43:44

标签: data-mining classification weka sentiment-analysis

我有一个情绪分析的任务。我有推文(标记为负面或正面)作为训练数据。我使用StringToWordVector和NaiveBayesMultinomial创建了一个模型。

代码:

try{

    TextDirectoryLoader loader = new TextDirectoryLoader();
    loader.setDirectory(new File("./train/"));
    Instances dataRaw = loader.getDataSet();
    System.out.println(loader.getStructure());


    StringToWordVector filter = new StringToWordVector();

    filter.setInputFormat(dataRaw);
    Instances dataFiltered = Filter.useFilter(dataRaw, filter);
    System.out.println("\n\nFiltered data:\n\n" + dataFiltered);

    // train Multinomial NaiveBayes classifier and output model
    NaiveBayesMultinomial classifier = new NaiveBayesMultinomial();
    classifier.buildClassifier(dataFiltered);
    //System.out.println("\n\nClassifier model:\n\n" + classifier);

    //save the model
    weka.core.SerializationHelper.write("./model/naviebayesmodel/", classifier);

}catch(Exception ex){
    ex.printStackTrace();
}

现在我想在新推特上测试这个模型。我无法计算出分类器的测试部分。我尝试了以下代码,但没有捕获任何实例。 如何使用现有模型测试新推文?

代码:

try{
        Classifier cls = (Classifier) weka.core.SerializationHelper.read("./model/naviebayesmodel");

        //Instances ins = (Instances)weka.core.SerializationHelper.read("./model/naviebayesmodel");
        //System.out.println(ins);
        //i.s
        TextDirectoryLoader loader = new TextDirectoryLoader();
        loader.setDirectory(new File("./test/-1/"));
        Instances dataRaw = loader.getDataSet();

        //String data = "hello, I am your test case. This is a great clasifier :) !!";
        StringToWordVector filter = new StringToWordVector();
        filter.setInputFormat(dataRaw);
        //Instances unlabeled = new Instances(new BufferedReader(new FileReader("./test/test.txt"))); 
        Instances dataFiltered = Filter.useFilter(dataRaw, filter);
        dataRaw.setClassIndex(dataRaw.numAttributes() - 1);

        //Instances dataFiltered = Filter.useFilter(unlabeled, filter);

        for (int i = 0; i < dataRaw.numInstances(); i++) {
            double clsLabel = cls.classifyInstance(dataRaw.instance(i));
            System.out.println(clsLabel);
        }
        //System.out.println(dataRaw.numInstances());

    }catch(Exception ex){
        ex.printStackTrace();
    }

0 个答案:

没有答案