如何使用NLTK 2.0.4和Python 2.6.6提取名称,地址等实体

时间:2017-06-15 08:13:58

标签: nltk python-2.6

我正在尝试使用NLTK 2.0.4和Python 2.6.6使用以下代码从自由文本中提取名称和地址。但是得到的错误是"全球名称' ne_chunk_sents'未定义"。不确定什么是修复请帮忙?或者有更好的方法来处理这种情况吗? 我尝试使用batch_ne_chunk但没有运气。

def package_get_entities(self,text):
    #text = text[0:300]
    entity_names = []
    chunked = self.get_chunked_sentences(text)
    for tree in chunked:
        entity_names.extend(self.extract_entity_names(tree))
    entity_names = list(set(entity_names))
    return entity_names

def get_chunked_sentences(self,text):
    sentences = nltk.sent_tokenize(text)
    tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
    tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
    chunked_sentences =nltk.ne_chunk_sents(tagged_sentences, binary=True)
    return chunked_sentences

def extract_entity_names(self,t):
    entity_names = []
    if hasattr(t, 'node') and t.node:
        if t.node == 'NE':
            entity_names.append(' '.join([child[0] for child in t]))
        else:
            for child in t:
                entity_names.extend(self.extract_entity_names(child))
    return entity_names

0 个答案:

没有答案