我正在尝试使用nltk从句子中识别人,组织和地点。
我的用例主要是从年度财务报告中提取审计师的姓名,组织和地点
在python中使用nltk时,结果似乎并不令人满意
import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
ex='Alastair John Richard Nuttall (Senior statutory auditor) for and on behalf of Ernst & Young LLP (Statutory auditor) Leeds'
ne_tree = ne_chunk(pos_tag(word_tokenize(ex)))
print(ne_tree)
Tree('S', [Tree('PERSON', [('Alastair', 'NNP')]), Tree('PERSON', [('John', 'NNP'), ('Richard', 'NNP'), ('Nuttall', 'NNP')]), ('(', '('), Tree('ORGANIZATION', [('Senior', 'NNP')]), ('statutory', 'NNP'), ('auditor', 'NN'), (')', ')'), ('for', 'IN'), ('and', 'CC'), ('on', 'IN'), ('behalf', 'NN'), ('of', 'IN'), Tree('GPE', [('Ernst', 'NNP')]), ('&', 'CC'), Tree('PERSON', [('Young', 'NNP'), ('LLP', 'NNP')]), ('(', '('), ('Statutory', 'NNP'), ('auditor', 'NN'), (')', ')'), ('Leeds', 'NNS')])
如上所示,“利兹”未被标识为地方,安永会计师事务所也未被确认为组织
是否有更好的方法可以在Python中实现?
答案 0 :(得分:1)
尝试使用spacy代替NLTK:
https://spacy.io/usage/linguistic-features#named-entities
我认为spacy的预训练模型可能会表现更好。您的句子的结果(带有spacy 2.1,en_core_web_lg)为:
Alastair John Richard Nuttall PERSON
安永律师事务所ORG
利兹GPE