有没有办法使用stanford NER库来输入令牌列表,并提取NE?
我已经检查了API,但它并不明确。大多数时候输入是一个字符串,一个文档,在这两种情况下,标记化都是在场景后完成的。
在我的情况下,我真的必须先进行标记化,然后将标记列表传递给API。我注意到我可以做到:
List<HasWord> words = new ArrayList<>();
words.add(new Word("Tesco"));
..... //adding elements to words
List<CoreLabel> labels =classifier.classifySentence(words);
这是对的吗?
非常感谢!!
答案 0 :(得分:2)
您可以使用Sentence.toCoreLabelList
method:
String[] token_strs = {"John", "met", "Amy", "in", "Los", "Angeles"};
List<CoreLabel> tokens = edu.stanford.nlp.ling.Sentence.toCoreLabelList(token_strs);
for (CoreLabel cl : classifier.classifySentence(tokens)) {
System.out.println(cl.toShorterString());
}
输出:
[Value=John Text=John Position=0 Answer=PERSON Shape=Xxxx DistSim=463]
[Value=met Text=met Position=1 Answer=O Shape=xxxk DistSim=476]
[Value=Amy Text=Amy Position=2 Answer=PERSON Shape=Xxx DistSim=396]
[Value=in Text=in Position=3 Answer=O Shape=xxk DistSim=510]
[Value=Los Text=Los Position=4 Answer=LOCATION Shape=Xxx DistSim=449]
[Value=Angeles Text=Angeles Position=5 Answer=LOCATION Shape=Xxxxx DistSim=199]
答案 1 :(得分:2)
以下是解决此问题的一种方法:
import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
import edu.stanford.nlp.util.*;
public class NERPreToken {
public static void main (String[] args) {
Properties props = new Properties();
props.setProperty("annotators",
"tokenize, ssplit, pos, lemma, ner");
props.setProperty("tokenize.whitespace", "true");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String [] tokensArray = {"Stephen","Colbert","hosts","a","show","on","CBS","."};
List<String> tokensList = Arrays.asList(tokensArray);
String docString = String.join(" ",tokensList);
Annotation annotation = new Annotation(docString);
pipeline.annotate(annotation);
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);
for (CoreLabel token : tokens) {
System.out.println(token.word()+" "+token.get(CoreAnnotations.NamedEntityTagAnnotation.class));
}
}
}
}
这里的关键是从你的令牌列表开始,并设置管道的标记属性,以便在白色空间上进行标记。然后提交一个字符串,其中包含由空格连接的标记。