在删除停用词并标记化之后,我的Python数据框看起来像这样
issue_detail
0
[I, outdated, information, credit, report, I, ...
1
[This, company, refuses, provide, verification...
2
[Need, move, XXXX, facility, ., Can, longer, a...
3
[I, wrote, Equifax, 6, weeks, ago, ., They, re...
4
[I, received, inquiry, alert, Experian, XXXX/X...
我现在想使用下面的代码使用命名实体识别
使用Alvas代码作为参考Named Entity Recognition with Regular Expression: NLTK
from nltk import ne_chunk, pos_tag
from nltk.tokenize import word_tokenize
from nltk.tree import Tree
def get_continuous_chunks(text):
chunked = ne_chunk(pos_tag(word_tokenize(text)))
prev = None
continuous_chunk = []
current_chunk = []
for i in chunked:
if type(i) == Tree:
current_chunk.append(" ".join([token for token, pos in i.leaves()]))
elif current_chunk:
named_entity = " ".join(current_chunk)
if named_entity not in continuous_chunk:
continuous_chunk.append(named_entity)
current_chunk = []
else:
continue
return continuous_chunk
# txt = 'The new GOP era in Washington got off to a messy start Tuesday as House Republicans,under pressure from President-elect Donald Trump.'
print (get_continuous_chunks(df))
此代码不能给我正确的结果,并且给我错误
TypeError:预期的字符串或类似字节的对象
能告诉我如何在此数据框上应用命名实体识别