提取头名词

时间:2015-03-25 20:05:42

标签: java stanford-nlp

我想知道如何提取头部名词?我使用了一个不起作用的选区解析器,但我想我必须使用依赖解析器。我运行了这个演示代码,但它给了我一个错误的答案。

public class dependencydemo {
  public static void main(String[] args) throws IOException {
    PrintWriter out;
    if (args.length > 1) {
      out = new PrintWriter(args[1]);
    } else {
      out = new PrintWriter(System.out);
    }



    StanfordCoreNLP pipeline = new StanfordCoreNLP();
    Annotation annotation;
    if (args.length > 0) {
      annotation = new       ` 
 Annotation(IOUtils.slurpFileNoExceptions(args[0]));`
    } else {
      annotation = new Annotation("Yesterday, I went to the Dallas `Country Club to play 25 cent Bingo.  While I was there I talked to my `friend Jim and we both agree that those people in Washington are `destroying our economy.");`
    }

    pipeline.annotate(annotation);
    pipeline.prettyPrint(annotation, out);


    List<CoreMap> sentences = `annotation.get(CoreAnnotations.SentencesAnnotation.class);`
    if (sentences != null && sentences.size() > 0) {
      CoreMap sentence = sentences.get(0);
      Tree tree = `sentence.get(TreeCoreAnnotations.TreeAnnotation.class);`
     // out.println();
    //  out.println("The first sentence parsed is:");
      tree.pennPrint(out);
    }
   }   

输出:

(ROOT
  (S
    (NP-TMP (NN Yesterday))
    (, ,)
    (NP (PRP I))
    (VP (VBD went)
      (PP (TO to)
        (NP (DT the) (NNP Dallas) (NNP Country) (NNP Club)))
      (S
        (VP (TO to)
          (VP (VB play)
            (S
              (NP (CD 25) (NN cent))
              (NP (NNP Bingo)))))))
    (. .)))

依赖关系:

root(ROOT-0, went-4)
tmod(went-4, Yesterday-1)
nsubj(went-4, I-3)
det(Club-9, the-6)
nn(Club-9, Dallas-7)
nn(Club-9, Country-8)
prep_to(went-4, Club-9)
aux(play-11, to-10)
xcomp(went-4, play-11)
num(cent-13, 25-12)
nsubj(Bingo-14, cent-13)
xcomp(play-11, Bingo-14)

如何从中提取头部名词?除此之外,似乎输出不正确。

1 个答案:

答案 0 :(得分:1)

我对你的评论中的解释给我的印象是,你想要所有名词短语的主要成分。使用CoreNLP非常容易。

  1. 首先,找到所有名词短语。您可以使用简单的Tregex模式执行此操作(请参阅Chris Manning's relevant answer)。
  2. 您可以使用CoreNLP&#34; head finder&#34;选择匹配的名词短语的句法头部成分。参见例如ModCollinsHeadFinder
  3. 演示代码如下。

    // Fetch a head finder.
    HeadFinder hf = new PennTreebankLanguagePack().headFinder();
    
    Tree myTree = ...
    TregexPattern tPattern = TregexPattern.compile("NP");
    TregexMatcher tMatcher = tPattern.matcher(myTree);
    while (tMatcher.find()) {
      Tree nounPhrase = tMatcher.getMatch();
    
      Tree headConstituent = hf.determineHead(nounPhrase);
      System.out.println(headConstituent);
    }