Python数据框上的命名实体识别

时间:2018-08-24 00:11:28

标签: python nltk ner

在删除停用词并标记化之后,我的Python数据框看起来像这样

issue_detail



0
[I, outdated, information, credit, report, I, ... 

1
[This, company, refuses, provide, verification... 

2
[Need, move, XXXX, facility, ., Can, longer, a... 

3
[I, wrote, Equifax, 6, weeks, ago, ., They, re... 

4
[I, received, inquiry, alert, Experian, XXXX/X... 

我现在想使用下面的代码使用命名实体识别

使用Alvas代码作为参考Named Entity Recognition with Regular Expression: NLTK

from nltk import ne_chunk, pos_tag
from nltk.tokenize import word_tokenize
from nltk.tree import Tree

def get_continuous_chunks(text):
    chunked = ne_chunk(pos_tag(word_tokenize(text)))
    prev = None
    continuous_chunk = []
    current_chunk = []

    for i in chunked:
        if type(i) == Tree:
            current_chunk.append(" ".join([token for token, pos in i.leaves()]))
        elif current_chunk:
            named_entity = " ".join(current_chunk)
            if named_entity not in continuous_chunk:
                continuous_chunk.append(named_entity)
                current_chunk = []
        else:
            continue

    return continuous_chunk

# txt = 'The new GOP era in Washington got off to a messy start Tuesday as House Republicans,under pressure from President-elect Donald Trump.'
print (get_continuous_chunks(df))

此代码不能给我正确的结果,并且给我错误

TypeError:预期的字符串或类似字节的对象

能告诉我如何在此数据框上应用命名实体识别

0 个答案:

没有答案