Question

好吧，我的目标是从文本中提取NE（Person）和与之相关的动词。例如，我有这样的文字：

邓布利多转身走回街上。哈利波特在没有醒来的情况下翻过毯子。

作为一个理想的结果，我应该

邓布利多转身走了;哈利波特滚了

我使用Stanford NER查找和标记人员，然后删除所有不包含NE的句子。所以，最后我有一个'纯'文本，只包含字符名称的句子。之后我使用Stanford Dependencies。结果我得到像这样的smth（CONLLU输出格式）：

1   Dumbledore  _   _   NN  _   2   nsubj   _   _
2   turned  _   _   VBD _   0   root    _   _
3   and _   _   CC  _   2   cc  _   _
4   walked  _   _   VBD _   2   conj    _   _
5   back    _   _   RB  _   4   advmod  _   _
6   down    _   _   IN  _   8   case    _   _
7   the _   _   DT  _   8   det _   _
8   street  _   _   NN  _   4   nmod    _   _
9   .   _   _   .   _   2   punct   _   _

1   Harry   _   _   NNP _   2   compound    _   _
2   Potter  _   _   NNP _   3   nsubj   _   _
3   rolled  _   _   VBD _   0   root    _   _
4   over    _   _   IN  _   3   compound:prt    _   _
5   inside  _   _   IN  _   7   case    _   _
6   his _   _   PRP$    _   7   nmod:poss   _   _
7   blankets    _   _   NNS _   3   nmod    _   _
8   without _   _   IN  _   9   mark    _   _
9   waking  _   _   VBG _   3   advcl   _   _
10  up  _   _   RP  _   9   compound:prt    _   _
11  .   _   _   .   _   3   punct   _   _

这就是我所有问题的起点。我知道这个人和动词，但是如何从这种格式中提取它我不知道。我想，我可以这样做：在表格中找到NN / NNP，找到它的“父”，然后提取它的所有'子'字。从理论上说它应该有效。理论上

问题是，是否有人可以提出任何其他想法如何从文本中获取一个人及其行动？或者，如果有更合理的方法可以做到这一点？

我将非常感谢您的帮助！

Answer 1

以下是一些示例代码，可帮助解决您的问题：

import java.io.*;
import java.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.util.*;



public class NERAndVerbExample {

  public static void main(String[] args) throws IOException {
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,depparse,entitymentions");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    String text = "John Smith went to the store.";
    Annotation annotation = new Annotation(text);
    pipeline.annotate(annotation);
    System.out.println("---");
    System.out.println("text: " + text);
    System.out.println("");
    System.out.println("dependency edges:");
    for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
      SemanticGraph sg = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);
      for (SemanticGraphEdge sge : sg.edgeListSorted()) {
        System.out.println(
                sge.getGovernor().word() + "," + sge.getGovernor().index() + "," + sge.getGovernor().tag() + "," +
                        sge.getGovernor().ner()
                        + " - " + sge.getRelation().getLongName()
                        + " -> "
                        + sge.getDependent().word() + "," +
                        +sge.getDependent().index() + "," + sge.getDependent().tag() + "," + sge.getDependent().ner());
      }
      System.out.println();
      System.out.println("entity mentions:");
      for (CoreMap entityMention : sentence.get(CoreAnnotations.MentionsAnnotation.class)) {
        int lastTokenIndex = entityMention.get(CoreAnnotations.TokensAnnotation.class).size()-1;
        System.out.println(entityMention.get(CoreAnnotations.TextAnnotation.class) +
                "\t" +
                entityMention.get(CoreAnnotations.TokensAnnotation.class)
                        .get(lastTokenIndex).get(CoreAnnotations.IndexAnnotation.class) + "\t" +
                entityMention.get(CoreAnnotations.NamedEntityTagAnnotation.class));
      }
    }
  }
}

我希望在斯坦福CoreNLP 3.8.0中添加一些语法糖，以协助与实体提及合作。

为了解释这段代码，基本上实体注释注释器通过并将具有相同NER标签的令牌组合在一起。所以“约翰史密斯”被标记为实体提及。

如果您浏览依赖图，则可以获取每个单词的索引。

同样，如果您访问实体提及的令牌列表，您还可以找到实体提及的每个单词的索引。

使用更多代码，您可以将这些代码链接在一起，并按照您的要求形成实体提及动词对。

正如您在当前代码中看到的那样，访问实体提及的信息非常麻烦，因此我将尝试在3.8.0中进行改进。

如何从文本中提取命名实体+动词

1 个答案: