检查词汇量

时间:2016-01-17 17:58:02

标签: python nltk

我想从语料库中获取50个最常用的单词,然后检查句子中是否存在这些单词。我想遍历所有句子并打印向量(如果单词在句子中则为0,否则为1)。我写了这段代码,但它只显示0(假)。有任何想法吗?

import nltk
from nltk import FreqDist
from nltk.corpus import brown


news = brown.words(categories='news') 
news_sents = brown.sents(categories='news')

fdist = FreqDist(w.lower() for w in news) 
word_features = list(fdist.values())[:50]

num_sents = len(news.sents(fileid))

for i in range(num_sents):
    features = {}
    for word in word_features:
        features[word] = int(word in news_sents[i])
print(features)

0 个答案:

没有答案