Question

我刚开始使用CoreNLP的TokenSequencePattern，我无法让简单的匹配工作。我试图做的就是匹配输入文本中的标记。下面的代码执行没有错误，但不匹配任何内容。但是，如果您将匹配表达式更改为 [] ，则它会匹配两个句子。

     Properties props = new Properties();
     props.put("annotators", "tokenize, ssplit, parse");
     StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
     Annotation document = new Annotation("This is sent 1. And here is sent 2");
     pipeline.annotate(document);
     List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

     Env env = TokenSequencePattern.getNewEnv();
     env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
     env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);

     TokenSequencePattern pattern = TokenSequencePattern.compile(env,"[ { word:\"sent\" } ]");
     TokenSequenceMatcher matcher = pattern.getMatcher(sentences);

     while ( matcher.find() ) {
        System.out.println( matcher.group() );
    }

谢谢！

Answer 1

List<CoreLabel> tokens = 
document.get(CoreAnnotations.TokensAnnotation.class);
TokenSequencePattern pattern= TokenSequencePattern.compile("[ { 
word:\"sent\" } ]");
TokenSequenceMatcher matcher = pattern.getMatcher(tokens);
while (matcher.find())
{
String matchedString = matcher.group();
List<CoreMap> matchedTokens = matcher.groupNodes();
System.out.println(matchedString + " " + matchedTokens);
}

如何使用TokenSequencePattern

1 个答案: