Question

我一直在尝试使用Stanford regex NER注释器和TokensRegex。效果很好，我只是想知道是否可以对词缀而不是单词进行正则表达式匹配？

例如，我创建标准的正则表达式NER tsv文件：

plane   TRANSPORT
car     TRANSPORT
...

我是否可以使用以下逻辑创建TokensRegex规则：if current token has lemma which has a match in TRANSPORT class, mark it as TRANSPORT。

目标是还将planes，cars等标记为TRANSPORT，而不必在tsv文件中指定所有这些变体。

Answer 1

命令

java -Xmx4g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -ner.additional.tokensregex.rules basic-ner.rules -file lemma-example.txt -outputFormat text

basic-ner.rules

# these Java classes will be used by the rules
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }

$TRANSPORT_TYPES = "/car|plane/"

# rule for recognizing transport
{ ruleType: "tokens", pattern: ([{lemma:$TRANSPORT_TYPES}]), action: Annotate($0, ner, "TRANSPORT"), result: "TRANSPORT" }

有关NER和TokensRegex的更多信息，请点击此处：

https://stanfordnlp.github.io/CoreNLP/ner.html

https://stanfordnlp.github.io/CoreNLP/tokensregex.html

Stanford NLP正则表达式在引理上

1 个答案: