如何在ne_chunk()之后对句子中的命名实体进行计数

时间:2017-06-03 14:12:05

标签: python python-3.x nlp nltk

所以,我正在用nltk3学习nlp,并且在练习其中一个例子时,我在计算句子中的命名实体时遇到困难。显然,nltk已更新,并且.node已从树结构中删除。这是我的代码:

    Workers Level   Selected village
0   10      Small   Aagar
4   84      Medium  Dhokari
7   127     Large   Takali
8   122     Large   Gardhani
9   120     Large   Pi.Khand

执行时我收到错误:

import sys
f=open('nyt.txt','r')
news_content=f.read()
import nltk
results=[]
for sent_no,sent in enumerate(nltk.sent_tokenize(news_content)):
    tokens=nltk.word_tokenize(sent)
    no_of_tokens=len(tokens)
    tagged=nltk.pos_tag(tokens)
    nouns=len([word for word,pos in tagged if pos in ["NN","NNP"]])
    ners=nltk.ne_chunk(tagged,binary=True)
    no_of_ners=len([chunk for chunk in ners if hasattr(chunk,'node')])
    score=(nouns+no_of_ners)/float(no_of_tokens)
    results.append((sent_no,no_of_tokens,no_of_ners,nouns,score,sent))
results.sort(key=lambda x:x[4])
print(results[5]) 

我需要访问命名实体并对它们进行计数。有人可以帮忙吗?

0 个答案:

没有答案