Stanford NLP正则表达式在引理上

时间:2019-01-21 07:36:10

标签: stanford-nlp

我一直在尝试使用Stanford regex NER注释器和TokensRegex。效果很好,我只是想知道是否可以对词缀而不是单词进行正则表达式匹配?

例如,我创建标准的正则表达式NER tsv文件:

plane   TRANSPORT
car     TRANSPORT
...

我是否可以使用以下逻辑创建TokensRegex规则:if current token has lemma which has a match in TRANSPORT class, mark it as TRANSPORT

目标是还将planescars等标记为TRANSPORT,而不必在tsv文件中指定所有这些变体。

1 个答案:

答案 0 :(得分:0)

命令

java -Xmx4g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -ner.additional.tokensregex.rules basic-ner.rules -file lemma-example.txt -outputFormat text

basic-ner.rules

# these Java classes will be used by the rules
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }

$TRANSPORT_TYPES = "/car|plane/"

# rule for recognizing transport
{ ruleType: "tokens", pattern: ([{lemma:$TRANSPORT_TYPES}]), action: Annotate($0, ner, "TRANSPORT"), result: "TRANSPORT" }

有关NER和TokensRegex的更多信息,请点击此处:

https://stanfordnlp.github.io/CoreNLP/ner.html

https://stanfordnlp.github.io/CoreNLP/tokensregex.html