我一直在尝试使用Stanford regex NER注释器和TokensRegex。效果很好,我只是想知道是否可以对词缀而不是单词进行正则表达式匹配?
例如,我创建标准的正则表达式NER tsv文件:
plane TRANSPORT
car TRANSPORT
...
我是否可以使用以下逻辑创建TokensRegex规则:if current token has lemma which has a match in TRANSPORT class, mark it as TRANSPORT
。
目标是还将planes
,cars
等标记为TRANSPORT,而不必在tsv文件中指定所有这些变体。
答案 0 :(得分:0)
命令
java -Xmx4g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -ner.additional.tokensregex.rules basic-ner.rules -file lemma-example.txt -outputFormat text
basic-ner.rules
# these Java classes will be used by the rules
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }
$TRANSPORT_TYPES = "/car|plane/"
# rule for recognizing transport
{ ruleType: "tokens", pattern: ([{lemma:$TRANSPORT_TYPES}]), action: Annotate($0, ner, "TRANSPORT"), result: "TRANSPORT" }
有关NER和TokensRegex的更多信息,请点击此处: