我想从下面脚本中给出的示例文本中获取NN或NNS。为此,当我使用下面的代码时,输出为:
types
synchronization
phase
synchronization
-RSB-
synchronization
-LSB-
-RSB-
projection
synchronization
为什么我会收到[-RSB-]
或[-LSB-]
?我应该使用不同的模式同时获得NN或NNS吗?
atic = "So far, many different types of synchronization have been investigated, such as complete synchronization [8], generalized synchronization [9], phase synchronization [10], lag synchronization [11], projection synchronization [12, 13], and so forth.";
Reader reader = new StringReader(atic);
DocumentPreprocessor dp = new DocumentPreprocessor(reader);
docs_terms_unq.put(rs.getString("u"), new ArrayList<String>());
docs_terms.put(rs.getString("u"), new ArrayList<String>());
for (List<HasWord> sentence : dp) {
List<TaggedWord> tagged = tagger.tagSentence(sentence);
GrammaticalStructure gs = parser.predict(tagged);
Tree x = parserr.parse(sentence);
System.out.println(x);
TregexPattern NPpattern = TregexPattern.compile("@NN|NNS");
TregexMatcher matcher = NPpattern.matcher(x);
while (matcher.findNextMatchingNode()) {
Tree match = matcher.getMatch();
ArrayList hh = match.yield();
Boolean b = false;
System.out.println(hh.toString());}
答案 0 :(得分:1)
我不知道为什么会这样。但是,如果您使用词性标注器,您将获得更准确的POS标签。我建议你直接看一下Annotation。这是一些示例代码。
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;
import java.util.Properties;
public class NNExample {
public static void main(String[] args) {
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "So far, many different types of synchronization have been investigated, such as complete " +
"synchronization [8], generalized synchronization [9], phase synchronization [10], " +
"lag synchronization [11], projection synchronization [12, 13], and so forth.";
Annotation annotation = new Annotation(text);
pipeline.annotate(annotation);
for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
String partOfSpeechTag = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
if (partOfSpeechTag.equals("NN") || partOfSpeechTag.equals("NNS")) {
System.out.println(token.word());
}
}
}
}
}
我得到的输出。
types
synchronization
synchronization
synchronization
phase
synchronization
lag
synchronization
projection
synchronization
答案 1 :(得分:1)
以下是从句子中获取NP的示例:
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.Word;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.trees.*;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Properties;
public class TreeExample {
public static void printNounPhrases(Tree inputTree) {
if (inputTree.label().value().equals("NP")) {
ArrayList<Word> words = new ArrayList<Word>();
for (Tree leaf : inputTree.getLeaves()) {
words.addAll(leaf.yieldWords());
}
System.out.println(words);
} else {
for (Tree subTree : inputTree.children()) {
printNounPhrases(subTree);
}
}
}
public static void main (String[] args) throws IOException {
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "Susan Thompson is from Florida.";
Annotation annotation = new Annotation(text);
pipeline.annotate(annotation);
Tree sentenceTree = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0).get(
TreeCoreAnnotations.TreeAnnotation.class);
//System.out.println(sentenceTree);
printNounPhrases(sentenceTree);
}
}