简单的CoreNLP - 如何将所有名词都添加到数组中?

时间:2017-06-10 05:14:30

标签: java arrays parsing stanford-nlp

我正在使用Stanford Simple NLP。我需要将所有名词值都添加到nounPhrases数组中。 me()方法给出了如下输出:

The parse of the sentence 'I like java and python' is (ROOT (S (NP (PRP I)) (VP (VBP like) (NP (NN java) (CC and) (NN python)))))

这是我的方法

public String s = "I like java and python";

public static Set<String> nounPhrases = new HashSet<>();

public void me() {

    Document doc = new Document(" " + s);
    for (Sentence sent : doc.sentences()) {

        System.out.println("The parse of the sentence '" + sent + "' is " + sent.parse());

        if (sent.parse().equals("NN") || sent.parse().equals("NNS") || sent.parse().equals("NNP")
                || sent.parse().equals("NNPS")) {

            // I need to assign all nouns to the array nounPhrases

        }

    }
}

我不确定我的条件是对还是错,因为我是斯坦福NLP的新手。请帮我把我的名词拿到这个阵列。

我在URL下面得到了示例代码表单,我对它进行了一些定制。

Simple CoreNLP

2 个答案:

答案 0 :(得分:1)

如果有人需要此解决方案的完整版本和最新版本,则为:

import java.util.HashSet;
import java.util.Properties;
import java.util.Set;

import edu.stanford.nlp.pipeline.CoreDocument;
import edu.stanford.nlp.pipeline.CoreSentence;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;


public class BasicPipelineExample4 {

  public static String text = "Joe Smith was born in California. "+
  "Study studying studied. " +
  "In 2017, he went to Paris, France in the summer. " +
  "His flight left at 3:00pm on July 10th, 2017. " +
  "After eating some escargot for the first time, Joe said, \"That was delicious!\" " +
  "He sent a postcard to his sister Jane Smith. " +
  "He is ok. " +
  "Simple, right? Remove removed removing was were is are element at given gave give index, insert it at desired index. Let's see if it works for the second test case."+
  "He is ok to go now. " +
  "After hearing about Joe's trip, Jane decided she might go to France one day.";

public static void main(String[] args) {
    Properties props = new Properties();
    // set the list of annotators to run
    props.setProperty("annotators", "tokenize,ssplit,pos,parse");
    // set a property for an annotator, in this case the coref annotator is being
    // set to use the neural algorithm
    props.setProperty("coref.algorithm", "neural");
    // build pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // create a document object
    CoreDocument doc = new CoreDocument(text);
    // annnotate the document
    pipeline.annotate(doc);

    Set<String> nounPhrases = new HashSet<>();

    for (CoreSentence sent : doc.sentences()) {

        System.out.println("The parse of the sentence '" + sent + "' is " + sent.constituencyParse());
        // Iterate over every word in the sentence
        for (int i = 0; i < sent.tokens().size(); i++) {
            // Condition: if the word is a noun (posTag starts with "NN")
            if (sent.posTags() != null && sent.posTags().get(i) != null && sent.posTags().get(i).contains("NN")) {
                // Put the word into the Set
                nounPhrases.add(sent.tokens().get(i).originalText());
            }
        }
    }

    System.out.println("Nouns: " + nounPhrases);

}

}

答案 1 :(得分:0)

你的情况几乎正确。你想要每个包含“NN”的POS标签的单词,即每个名词。要检查每个单词的POS标记,您必须迭代句子中的每个单词。根据您的代码,它可能如下所示:

public String s = "I like java and python";

public static Set<String> nounPhrases = new HashSet<>();

public void me() {

    Document doc = new Document(" " + s);
    for (Sentence sent : doc.sentences()) {

        System.out.println("The parse of the sentence '" + sent + "' is " + sent.parse());
        //Iterate over every word in the sentence
        for(int i = 0; i < sent.words().size(); i++) {
            //Condition: if the word is a noun (posTag starts with "NN")
            if (sent.posTag(i) != null && sent.posTag(i).contains("NN")) {
                //Put the word into the Set
                nounPhrases.add(sent.word(i));
            }
        }
    }
}