Question

实际的模式不是用英语编写的，因此我创建了这个简化的示例来重现该问题：有3个级别的注释（对于实际应用是必需的），第3个级别的模式无法按预期工作。要识别的短语是： a b c

我期望的是

第一级：“ a”标注为A，“ b”标注为“ B”
2nd：如果有注释A和B，则将它们全部注释为AB
3rd：如果存在至少一个注释AB，并且有单词“ c”，则将它们全部注释为C 模式如下所示。

# 1.
{  pattern: (/a/), action: (Annotate($0, name, "A")) }
{  pattern: (/b/), action: (Annotate($0, name, "B")) }
# 2.
{  pattern: (([name:A]) ([name:B])), action: (Annotate($0, name, "AB")) }
# 3.
{  pattern: (([name:AB]+) /c/), action: (Annotate($0, name, "C")) }

＃1和＃2的作品以及“ a b”都带有注释：匹配的令牌：NamedEntitiesToken {word ='a'name ='AB'beginPosition = 0 endPosition = 1} 匹配的令牌：NamedEntitiesToken {word ='b'name ='AB'beginPosition = 2 endPosition = 3} 但是，即使有人看到我们有2个带有“ AB”注释的标记，＃3模式也不起作用，而这正是＃3模式所期望的。如果我将＃1更改为

{  pattern: (/a/), action: (Annotate($0, name, "AB")) }
{  pattern: (/b/), action: (Annotate($0, name, "AB")) }

模式＃3正常工作：匹配的令牌：NamedEntitiesToken {word ='a'name ='C'beginPosition = 0 endPosition = 1} 匹配的令牌：NamedEntitiesToken {word ='b'name ='C'beginPosition = 2 endPosition = 3} 匹配的令牌：NamedEntitiesToken {word ='c'name ='C'beginPosition = 4 endPosition = 5}

使用时我找不到匹配的令牌之间的任何区别

# In this case #3 pattern works
{  pattern: (/a/), action: (Annotate($0, name, "AB")) }
{  pattern: (/b/), action: (Annotate($0, name, "AB")) }

或当我使用

# In this case #3 pattern doesn't work
# 1.
{  pattern: (/a/), action: (Annotate($0, name, "A")) }
{  pattern: (/b/), action: (Annotate($0, name, "B")) }
# 2.
{  pattern: (([name:A]) ([name:B])), action: (Annotate($0, name, "AB")) }

在两种情况下，我都得到相同的注释，但是第一种情况有效，第二种情况无效。我在做什么错了？

Answer 1

这对我有用：

# these Java classes will be used by the rules
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }

ENV.defaults["stage"] = 1

{ ruleType: "tokens", pattern: (/a/), action: Annotate($0, ner, "A") }
{ ruleType: "tokens", pattern: (/b/), action: Annotate($0, ner, "B") }

ENV.defaults["stage"] = 2

{ ruleType: "tokens", pattern: ([{ner: "A"}] [{ner: "B"}]), action: Annotate($0, ner, "AB") }

ENV.defaults["stage"] = 3

{ ruleType: "tokens", pattern: ([{ner: "AB"}]+ /c/), action: Annotate($0, ner, "ABC") }

这里有关于TokensRegex的文章：

https://stanfordnlp.github.io/CoreNLP/tokensregex.html

模式的行为不符合预期

1 个答案: