如何使用阿拉伯语模型在斯坦福解析器中提取NP子树?

时间:2018-07-08 16:01:01

标签: java stanford-nlp

来自 (根(S(VP(VBDذهب)(NP(NNPاحمد))(PP(INالي)(NP(DTNNالمدرسه)))))

1 个答案:

答案 0 :(得分:0)

通常,我建议使用完整的Stanford CoreNLP。

您可以在这里获得:https://stanfordnlp.github.io/CoreNLP/

以下是一些使用Java API查找特定成分的示例代码:

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.*;

import java.util.*;

public class ConstituentExample {

  public static void main(String[] args) {
    // set up pipeline properties
    Properties props = StringUtils.argsToProperties("-props", "StanfordCoreNLP-arabic.properties");
    // set up Stanford CoreNLP pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // build annotation for a review
    Annotation annotation =
        new Annotation("...");
    // annotate
    pipeline.annotate(annotation);
    // get tree
    Tree tree =
        annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0).get(TreeCoreAnnotations.TreeAnnotation.class);
    System.out.println(tree);
    Set<Constituent> treeConstituents = tree.constituents(new LabeledScoredConstituentFactory());
    for (Constituent constituent : treeConstituents) {
      if (constituent.label() != null &&
          (constituent.label().toString().equals("VP") || constituent.label().toString().equals("NP"))) {
        System.err.println("found constituent: "+constituent.toString());
        System.err.println(tree.getLeaves().subList(constituent.start(), constituent.end()+1));
      }
    }
  }
}