我有一个csv文件,格式如下 productname,审查产品
现在使用mallet我必须训练分类器,这样如果测试数据集作为输入包含产品评论,它应该告诉我特定评论属于哪个产品
mallet java api帮助将不胜感激
答案 0 :(得分:8)
这是一个适合您案例的小例子:
public static void main(String[] args) throws IOException {
//prepare instance transformation pipeline
ArrayList<Pipe> pipes = new ArrayList<Pipe>();
pipes.add(new Target2Label());
pipes.add(new CharSequence2TokenSequence());
pipes.add(new TokenSequence2FeatureSequence());
pipes.add(new FeatureSequence2FeatureVector());
SerialPipes pipe = new SerialPipes(pipes);
//prepare training instances
InstanceList trainingInstanceList = new InstanceList(pipe);
trainingInstanceList.addThruPipe(new CsvIterator(new FileReader("datasets/training.txt"), "(.*),(.*)", 2, 1, -1));
//prepare test instances
InstanceList testingInstanceList = new InstanceList(pipe);
testingInstanceList.addThruPipe(new CsvIterator(new FileReader("datasets/testing.txt"), "(.*),(.*)", 2, 1, -1));
ClassifierTrainer trainer = new NaiveBayesTrainer();
Classifier classifier = trainer.train(trainingInstanceList);
System.out.println("Accuracy: " + classifier.getAccuracy(testingInstanceList));
}