斯坦福NLP命名了多个令牌

时间:2016-11-15 20:13:12

标签: stanford-nlp

我正在尝试使用Stanford Core NLP进行命名实体识别。

某些命名实体由多个令牌组成,例如,Person:" Bill Smith"。我无法弄清楚用什么API来确定"比尔"和#34;史密斯"应该被视为一个单一的实体,当它们应该是两个不同的实体时。

是否有某些体面的文档解释了这个?

这是我目前的代码:

    InputStream is = getClass().getResourceAsStream(MODEL_NAME);
    if (MODEL_NAME.endsWith(".gz")) {
        is = new GZIPInputStream(is);
    }
    is = new BufferedInputStream(is);

    Properties props = new Properties();
    props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");

    AbstractSequenceClassifier<CoreLabel> classifier = CRFClassifier.getClassifier(is);
    is.close();

    String text = "Hello, Bill Smith, how are you?";

    List<List<CoreLabel>> sentences = classifier.classify(text);
    for (List<CoreLabel> sentence: sentences) {
        for (CoreLabel word: sentence) {
            String type = word.get(CoreAnnotations.AnswerAnnotation.class);
            System.out.println(word + " is of type " + type);
        }
    }

另外,我并不清楚为什么&#34; PERSON&#34;注释将作为AnswerAnnotation返回,而不是CoreAnnotations.EntityClassAnnotation,EntityTypeAnnotation或其他内容。

1 个答案:

答案 0 :(得分:1)

您应该使用“entitymentions”注释器,它将标记具有与实体相同的ner标签的连续令牌序列。每个句子的实体列表将存储在CoreAnnotations.MentionsAnnotation.class键下。每个实体都提到自己将是一个CoreMap。

查看此代码可能有所帮助:

https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/pipeline/EntityMentionsAnnotator.java

一些示例代码:

import java.io.*;
import java.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;



public class EntityMentionsExample {

  public static void main (String[] args) throws IOException {
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    String text = "Joe Smith is from Florida.";
    Annotation annotation = new Annotation(text);
    pipeline.annotate(annotation);
    System.out.println("---");
    System.out.println("text: " + text);
    for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
      for (CoreMap entityMention : sentence.get(CoreAnnotations.MentionsAnnotation.class)) {
        System.out.print(entityMention.get(CoreAnnotations.TextAnnotation.class));
        System.out.print("\t");
        System.out.print(
                entityMention.get(CoreAnnotations.NamedEntityTagAnnotation.class));
        System.out.println();
      }
    }
  }
}