我正在使用Stanford Simple NLP。我需要将所有名词值都添加到nounPhrases数组中。 me()方法给出了如下输出:
The parse of the sentence 'I like java and python' is (ROOT (S (NP (PRP I)) (VP (VBP like) (NP (NN java) (CC and) (NN python)))))
这是我的方法
public String s = "I like java and python";
public static Set<String> nounPhrases = new HashSet<>();
public void me() {
Document doc = new Document(" " + s);
for (Sentence sent : doc.sentences()) {
System.out.println("The parse of the sentence '" + sent + "' is " + sent.parse());
if (sent.parse().equals("NN") || sent.parse().equals("NNS") || sent.parse().equals("NNP")
|| sent.parse().equals("NNPS")) {
// I need to assign all nouns to the array nounPhrases
}
}
}
我不确定我的条件是对还是错,因为我是斯坦福NLP的新手。请帮我把我的名词拿到这个阵列。
我在URL下面得到了示例代码表单,我对它进行了一些定制。
答案 0 :(得分:1)
如果有人需要此解决方案的完整版本和最新版本,则为:
import java.util.HashSet;
import java.util.Properties;
import java.util.Set;
import edu.stanford.nlp.pipeline.CoreDocument;
import edu.stanford.nlp.pipeline.CoreSentence;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
public class BasicPipelineExample4 {
public static String text = "Joe Smith was born in California. "+
"Study studying studied. " +
"In 2017, he went to Paris, France in the summer. " +
"His flight left at 3:00pm on July 10th, 2017. " +
"After eating some escargot for the first time, Joe said, \"That was delicious!\" " +
"He sent a postcard to his sister Jane Smith. " +
"He is ok. " +
"Simple, right? Remove removed removing was were is are element at given gave give index, insert it at desired index. Let's see if it works for the second test case."+
"He is ok to go now. " +
"After hearing about Joe's trip, Jane decided she might go to France one day.";
public static void main(String[] args) {
Properties props = new Properties();
// set the list of annotators to run
props.setProperty("annotators", "tokenize,ssplit,pos,parse");
// set a property for an annotator, in this case the coref annotator is being
// set to use the neural algorithm
props.setProperty("coref.algorithm", "neural");
// build pipeline
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// create a document object
CoreDocument doc = new CoreDocument(text);
// annnotate the document
pipeline.annotate(doc);
Set<String> nounPhrases = new HashSet<>();
for (CoreSentence sent : doc.sentences()) {
System.out.println("The parse of the sentence '" + sent + "' is " + sent.constituencyParse());
// Iterate over every word in the sentence
for (int i = 0; i < sent.tokens().size(); i++) {
// Condition: if the word is a noun (posTag starts with "NN")
if (sent.posTags() != null && sent.posTags().get(i) != null && sent.posTags().get(i).contains("NN")) {
// Put the word into the Set
nounPhrases.add(sent.tokens().get(i).originalText());
}
}
}
System.out.println("Nouns: " + nounPhrases);
}
}
答案 1 :(得分:0)
你的情况几乎正确。你想要每个包含“NN”的POS标签的单词,即每个名词。要检查每个单词的POS标记,您必须迭代句子中的每个单词。根据您的代码,它可能如下所示:
public String s = "I like java and python";
public static Set<String> nounPhrases = new HashSet<>();
public void me() {
Document doc = new Document(" " + s);
for (Sentence sent : doc.sentences()) {
System.out.println("The parse of the sentence '" + sent + "' is " + sent.parse());
//Iterate over every word in the sentence
for(int i = 0; i < sent.words().size(); i++) {
//Condition: if the word is a noun (posTag starts with "NN")
if (sent.posTag(i) != null && sent.posTag(i).contains("NN")) {
//Put the word into the Set
nounPhrases.add(sent.word(i));
}
}
}
}