斯坦福大学NER未正确提取百分比

时间:2017-01-09 11:50:55

标签: stanford-nlp named-entity-recognition

我正在尝试使用斯坦福NER提取百分比。但它没有正确提取百分比。

inp_str = 'total revenue received was one hundred and twenty five percent 125% for last financial year'
split_inp_str = inp_str.split()
st = StanfordNERTagger('english.muc.7class.distsim.crf.ser.gz')
print(st.tag(split_inp_str))

这给出了以下输出

[('total', 'O'), ('revenue', 'O'), ('received', 'O'), ('was', 'O'), ('one', 'O'), ('hundred', 'O'), ('and', 'O'), ('twenty', 'O'), ('five', 'PERCENT'), ('percent', 'PERCENT'), ('125%', 'O'), ('for', 'O'), ('last', 'O'), ('financial', 'O'), ('year', 'O')]

为什么不提取 125%百分之二百五十

1 个答案:

答案 0 :(得分:-1)

您需要将句子标记为句子而不是split()。请尝试以下代码。

from nltk import word_tokenize

inp_str = 'total revenue received was one hundred and twenty five percent 125% for last financial year'
split_inp_str = word_tokenize(inp_str)
st = StanfordNERTagger('english.muc.7class.distsim.crf.ser.gz')
print(st.tag(split_inp_str))