我正在尝试使用Stanford Core NLP进行命名实体识别。
某些命名实体由多个令牌组成,例如,Person:" Bill Smith"。我无法弄清楚用什么API来确定"比尔"和#34;史密斯"应该被视为一个单一的实体,当它们应该是两个不同的实体时。
是否有某些体面的文档解释了这个?
这是我目前的代码:
InputStream is = getClass().getResourceAsStream(MODEL_NAME);
if (MODEL_NAME.endsWith(".gz")) {
is = new GZIPInputStream(is);
}
is = new BufferedInputStream(is);
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
AbstractSequenceClassifier<CoreLabel> classifier = CRFClassifier.getClassifier(is);
is.close();
String text = "Hello, Bill Smith, how are you?";
List<List<CoreLabel>> sentences = classifier.classify(text);
for (List<CoreLabel> sentence: sentences) {
for (CoreLabel word: sentence) {
String type = word.get(CoreAnnotations.AnswerAnnotation.class);
System.out.println(word + " is of type " + type);
}
}
另外,我并不清楚为什么&#34; PERSON&#34;注释将作为AnswerAnnotation返回,而不是CoreAnnotations.EntityClassAnnotation,EntityTypeAnnotation或其他内容。
答案 0 :(得分:1)
您应该使用“entitymentions”注释器,它将标记具有与实体相同的ner标签的连续令牌序列。每个句子的实体列表将存储在CoreAnnotations.MentionsAnnotation.class键下。每个实体都提到自己将是一个CoreMap。
查看此代码可能有所帮助:
一些示例代码:
import java.io.*;
import java.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;
public class EntityMentionsExample {
public static void main (String[] args) throws IOException {
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "Joe Smith is from Florida.";
Annotation annotation = new Annotation(text);
pipeline.annotate(annotation);
System.out.println("---");
System.out.println("text: " + text);
for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
for (CoreMap entityMention : sentence.get(CoreAnnotations.MentionsAnnotation.class)) {
System.out.print(entityMention.get(CoreAnnotations.TextAnnotation.class));
System.out.print("\t");
System.out.print(
entityMention.get(CoreAnnotations.NamedEntityTagAnnotation.class));
System.out.println();
}
}
}
}