用木槌训练分类器

时间:2012-08-10 16:17:14

标签: classification mallet

我有一个csv文件,格式如下 productname,审查产品

现在使用mallet我必须训练分类器,这样如果测试数据集作为输入包含产品评论,它应该告诉我特定评论属于哪个产品

mallet java api帮助将不胜感激

1 个答案:

答案 0 :(得分:8)

这是一个适合您案例的小例子:

    public static void main(String[] args) throws IOException {
        //prepare instance transformation pipeline
        ArrayList<Pipe> pipes = new ArrayList<Pipe>();
        pipes.add(new Target2Label());
        pipes.add(new CharSequence2TokenSequence());
        pipes.add(new TokenSequence2FeatureSequence());
        pipes.add(new FeatureSequence2FeatureVector());
        SerialPipes pipe = new SerialPipes(pipes);

        //prepare training instances
        InstanceList trainingInstanceList = new InstanceList(pipe);
        trainingInstanceList.addThruPipe(new CsvIterator(new FileReader("datasets/training.txt"), "(.*),(.*)", 2, 1, -1));

        //prepare test instances
        InstanceList testingInstanceList = new InstanceList(pipe);        
        testingInstanceList.addThruPipe(new CsvIterator(new FileReader("datasets/testing.txt"), "(.*),(.*)", 2, 1, -1));

        ClassifierTrainer trainer = new NaiveBayesTrainer();
        Classifier classifier = trainer.train(trainingInstanceList);
        System.out.println("Accuracy: " + classifier.getAccuracy(testingInstanceList));
   }