实际的模式不是用英语编写的,因此我创建了这个简化的示例来重现该问题:有3个级别的注释(对于实际应用是必需的),第3个级别的模式无法按预期工作。 要识别的短语是: a b c
我期望的是
# 1.
{ pattern: (/a/), action: (Annotate($0, name, "A")) }
{ pattern: (/b/), action: (Annotate($0, name, "B")) }
# 2.
{ pattern: (([name:A]) ([name:B])), action: (Annotate($0, name, "AB")) }
# 3.
{ pattern: (([name:AB]+) /c/), action: (Annotate($0, name, "C")) }
#1和#2的作品以及“ a b”都带有注释: 匹配的令牌:NamedEntitiesToken {word ='a'name ='AB'beginPosition = 0 endPosition = 1} 匹配的令牌:NamedEntitiesToken {word ='b'name ='AB'beginPosition = 2 endPosition = 3} 但是,即使有人看到我们有2个带有“ AB”注释的标记,#3模式也不起作用,而这正是#3模式所期望的。 如果我将#1更改为
{ pattern: (/a/), action: (Annotate($0, name, "AB")) }
{ pattern: (/b/), action: (Annotate($0, name, "AB")) }
模式#3正常工作: 匹配的令牌:NamedEntitiesToken {word ='a'name ='C'beginPosition = 0 endPosition = 1} 匹配的令牌:NamedEntitiesToken {word ='b'name ='C'beginPosition = 2 endPosition = 3} 匹配的令牌:NamedEntitiesToken {word ='c'name ='C'beginPosition = 4 endPosition = 5}
使用时我找不到匹配的令牌之间的任何区别
# In this case #3 pattern works
{ pattern: (/a/), action: (Annotate($0, name, "AB")) }
{ pattern: (/b/), action: (Annotate($0, name, "AB")) }
或当我使用
# In this case #3 pattern doesn't work
# 1.
{ pattern: (/a/), action: (Annotate($0, name, "A")) }
{ pattern: (/b/), action: (Annotate($0, name, "B")) }
# 2.
{ pattern: (([name:A]) ([name:B])), action: (Annotate($0, name, "AB")) }
在两种情况下,我都得到相同的注释,但是第一种情况有效,第二种情况无效。 我在做什么错了?
答案 0 :(得分:0)
这对我有用:
# these Java classes will be used by the rules
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
ENV.defaults["stage"] = 1
{ ruleType: "tokens", pattern: (/a/), action: Annotate($0, ner, "A") }
{ ruleType: "tokens", pattern: (/b/), action: Annotate($0, ner, "B") }
ENV.defaults["stage"] = 2
{ ruleType: "tokens", pattern: ([{ner: "A"}] [{ner: "B"}]), action: Annotate($0, ner, "AB") }
ENV.defaults["stage"] = 3
{ ruleType: "tokens", pattern: ([{ner: "AB"}]+ /c/), action: Annotate($0, ner, "ABC") }
这里有关于TokensRegex的文章: