我刚开始使用CoreNLP的TokenSequencePattern,我无法让简单的匹配工作。我试图做的就是匹配输入文本中的标记。下面的代码执行没有错误,但不匹配任何内容。但是,如果您将匹配表达式更改为 [] ,则它会匹配两个句子。
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, parse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation document = new Annotation("This is sent 1. And here is sent 2");
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
Env env = TokenSequencePattern.getNewEnv();
env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
TokenSequencePattern pattern = TokenSequencePattern.compile(env,"[ { word:\"sent\" } ]");
TokenSequenceMatcher matcher = pattern.getMatcher(sentences);
while ( matcher.find() ) {
System.out.println( matcher.group() );
}
谢谢!
答案 0 :(得分:-1)
List<CoreLabel> tokens =
document.get(CoreAnnotations.TokensAnnotation.class);
TokenSequencePattern pattern= TokenSequencePattern.compile("[ {
word:\"sent\" } ]");
TokenSequenceMatcher matcher = pattern.getMatcher(tokens);
while (matcher.find())
{
String matchedString = matcher.group();
List<CoreMap> matchedTokens = matcher.groupNodes();
System.out.println(matchedString + " " + matchedTokens);
}