有没有办法做到这一点?
在regexner.mapping文件中添加另一列来描述命名实体的某些方面,例如:
工程学士学位2.0版本some_data_information_1
Lalor位置人员2.0 some_data_information_2
劳动 组织2.0 some_data_information_3
这个想法是,当检测到“实体提及”时,此信息将可访问,例如some_data_information
可能是另一个数据库或其他任何数据库中的密钥。
List<CoreMap> entityMentions = document.get(MentionsAnnotation.class);
for (CoreMap entityMention : entityMentions) {
//get the information in the description column...
entityMention.get( ... );
}
这可以吗?
答案 0 :(得分:0)
RegexNER目前不支持该类型的功能。您可以编写TokensRegex规则来做到这一点。
# make all patterns case-sensitive
ENV.defaultStringMatchFlags = 0
ENV.defaultStringPatternFlags = 0
# these Java classes will be used by the rules
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
nerInfo = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NERInfo" }
tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }
# define some regexes over tokens
$COMPANY_BEGINNING = "/[A-Z][A-Za-z]+/"
$COMPANY_ENDING = "/(Corp|Inc)\.?/"
# rule for recognizing company names
{ ruleType: "tokens", pattern: ([{word:$COMPANY_BEGINNING} & {tag:"NNP"}]+ [{word:$COMPANY_ENDING}]), action: (Annotate($0, ner, "COMPANY"), Annotate($0, nerInfo, "COMPANY_INFO")), result: "COMPANY_RESULT" }
// replace "edu.stanford.nlp.ling.CoreAnnotations$NERInfo" with a class you define (that class does not exist, I just list it as an example.)
在此处使用TokensRegex的完整详细信息:https://stanfordnlp.github.io/CoreNLP/tokensregex.html