Question

我正在使用Stanford Simple NLP。我需要将所有名词值都添加到nounPhrases数组中。 me（）方法给出了如下输出：

The parse of the sentence 'I like java and python' is (ROOT (S (NP (PRP I)) (VP (VBP like) (NP (NN java) (CC and) (NN python)))))

这是我的方法

public String s = "I like java and python";

public static Set<String> nounPhrases = new HashSet<>();

public void me() {

    Document doc = new Document(" " + s);
    for (Sentence sent : doc.sentences()) {

        System.out.println("The parse of the sentence '" + sent + "' is " + sent.parse());

        if (sent.parse().equals("NN") || sent.parse().equals("NNS") || sent.parse().equals("NNP")
                || sent.parse().equals("NNPS")) {

            // I need to assign all nouns to the array nounPhrases

        }

    }
}

我不确定我的条件是对还是错，因为我是斯坦福NLP的新手。请帮我把我的名词拿到这个阵列。

我在URL下面得到了示例代码表单，我对它进行了一些定制。

Simple CoreNLP

Answer 1

如果有人需要此解决方案的完整版本和最新版本，则为：

import java.util.HashSet;
import java.util.Properties;
import java.util.Set;

import edu.stanford.nlp.pipeline.CoreDocument;
import edu.stanford.nlp.pipeline.CoreSentence;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;


public class BasicPipelineExample4 {

  public static String text = "Joe Smith was born in California. "+
  "Study studying studied. " +
  "In 2017, he went to Paris, France in the summer. " +
  "His flight left at 3:00pm on July 10th, 2017. " +
  "After eating some escargot for the first time, Joe said, \"That was delicious!\" " +
  "He sent a postcard to his sister Jane Smith. " +
  "He is ok. " +
  "Simple, right? Remove removed removing was were is are element at given gave give index, insert it at desired index. Let's see if it works for the second test case."+
  "He is ok to go now. " +
  "After hearing about Joe's trip, Jane decided she might go to France one day.";

public static void main(String[] args) {
    Properties props = new Properties();
    // set the list of annotators to run
    props.setProperty("annotators", "tokenize,ssplit,pos,parse");
    // set a property for an annotator, in this case the coref annotator is being
    // set to use the neural algorithm
    props.setProperty("coref.algorithm", "neural");
    // build pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // create a document object
    CoreDocument doc = new CoreDocument(text);
    // annnotate the document
    pipeline.annotate(doc);

    Set<String> nounPhrases = new HashSet<>();

    for (CoreSentence sent : doc.sentences()) {

        System.out.println("The parse of the sentence '" + sent + "' is " + sent.constituencyParse());
        // Iterate over every word in the sentence
        for (int i = 0; i < sent.tokens().size(); i++) {
            // Condition: if the word is a noun (posTag starts with "NN")
            if (sent.posTags() != null && sent.posTags().get(i) != null && sent.posTags().get(i).contains("NN")) {
                // Put the word into the Set
                nounPhrases.add(sent.tokens().get(i).originalText());
            }
        }
    }

    System.out.println("Nouns: " + nounPhrases);

}

}

Answer 2

你的情况几乎正确。你想要每个包含“NN”的POS标签的单词，即每个名词。要检查每个单词的POS标记，您必须迭代句子中的每个单词。根据您的代码，它可能如下所示：

public String s = "I like java and python";

public static Set<String> nounPhrases = new HashSet<>();

public void me() {

    Document doc = new Document(" " + s);
    for (Sentence sent : doc.sentences()) {

        System.out.println("The parse of the sentence '" + sent + "' is " + sent.parse());
        //Iterate over every word in the sentence
        for(int i = 0; i < sent.words().size(); i++) {
            //Condition: if the word is a noun (posTag starts with "NN")
            if (sent.posTag(i) != null && sent.posTag(i).contains("NN")) {
                //Put the word into the Set
                nounPhrases.add(sent.word(i));
            }
        }
    }
}

简单的CoreNLP - 如何将所有名词都添加到数组中？

2 个答案: