我试图在他们的网站上使用Stanford tokenizer和以下示例:
import java.io.FileReader;
import java.io.IOException;
import java.util.List;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.ling.HasWord;
import edu.stanford.nlp.process.CoreLabelTokenFactory;
import edu.stanford.nlp.process.DocumentPreprocessor;
import edu.stanford.nlp.process.PTBTokenizer;
public class TokenizerDemo {
public static void main(String[] args) throws IOException {
for (String arg : args) {
// option #1: By sentence.
DocumentPreprocessor dp = new DocumentPreprocessor(arg);
for (List sentence : dp) {
System.out.println(sentence);
}
// option #2: By token
PTBTokenizer ptbt = new PTBTokenizer(new FileReader(arg),
new CoreLabelTokenFactory(), "");
for (CoreLabel label; ptbt.hasNext(); ) {
label = ptbt.next();
System.out.println(label);
}
}
}
}
我尝试编译时遇到以下错误:
TokenizerDemo.java:24: error: incompatible types: Object cannot be converted to CoreLabel
label = ptbt.next();
有谁知道原因可能是什么?如果您感兴趣,我使用Java 1.8并确保CLASSPATH包含jar文件。
答案 0 :(得分:2)
尝试参数化PTBTokenizer
类。例如:
PTBTokenizer<CoreLabel> ptbt = new PTBTokenizer<>(new FileReader(arg),
new CoreLabelTokenFactory(), "");