Question

我想使用Stanford CoreNLP进行词形还原，但我有些话不能被词形化。有没有办法将这个忽略列表提供给工具？我正在关注此code，当程序调用{{1}}时，那就是它;更换事件很难。一种解决方案是创建一个映射列表，其中每个要忽略的单词与lemmatize（单词）配对（即，d = {（w1，lemmatize（w1）），（w2，lemmatize（w2），...}和使用此映射列表进行后期处理。但我认为它应该比这更容易。

感谢您的帮助。

Answer 1

我想我在朋友的帮助下找到了解决方案。

  for(CoreMap sentence: sentences) {
        // Iterate over all tokens in a sentence
        for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
            System.out.print(token.get(OriginalTextAnnotation.class) + "\t");
            System.out.println(token.get(LemmaAnnotation.class));

        }
    }

您可以致电token.get(OriginalTextAnnotation.class)来获取该单词的原始形式。

忽略lemmatizer的单词

1 个答案: