命名实体识别RegexNER添加更多列以供参考

时间:2019-01-22 10:22:17

标签: stanford-nlp

有没有办法做到这一点?

在regexner.mapping文件中添加另一列来描述命名实体的某些方面,例如:

工程学士学位2.0版本some_data_information_1

Lalor位置人员2.0 some_data_information_2

劳动        组织2.0     some_data_information_3

这个想法是,当检测到“实体提及”时,此信息将可访问,例如some_data_information可能是另一个数据库或其他任何数据库中的密钥。

List<CoreMap> entityMentions = document.get(MentionsAnnotation.class);

for (CoreMap entityMention : entityMentions) {
  //get the information in the description column...
  entityMention.get( ... );
} 

这可以吗?

1 个答案:

答案 0 :(得分:0)

RegexNER目前不支持该类型的功能。您可以编写TokensRegex规则来做到这一点。

# make all patterns case-sensitive
ENV.defaultStringMatchFlags = 0
ENV.defaultStringPatternFlags = 0

# these Java classes will be used by the rules
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
nerInfo = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NERInfo" }
tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }

# define some regexes over tokens
$COMPANY_BEGINNING = "/[A-Z][A-Za-z]+/"
$COMPANY_ENDING = "/(Corp|Inc)\.?/"

# rule for recognizing company names
{ ruleType: "tokens", pattern: ([{word:$COMPANY_BEGINNING} & {tag:"NNP"}]+ [{word:$COMPANY_ENDING}]), action: (Annotate($0, ner, "COMPANY"), Annotate($0, nerInfo, "COMPANY_INFO")), result: "COMPANY_RESULT" }


// replace "edu.stanford.nlp.ling.CoreAnnotations$NERInfo" with a class you define (that class does not exist, I just list it as an example.)

在此处使用TokensRegex的完整详细信息:https://stanfordnlp.github.io/CoreNLP/tokensregex.html