我想从语料库中获取50个最常用的单词,然后检查句子中是否存在这些单词。我想遍历所有句子并打印向量(如果单词在句子中则为0,否则为1)。我写了这段代码,但它只显示0(假)。有任何想法吗?
import nltk
from nltk import FreqDist
from nltk.corpus import brown
news = brown.words(categories='news')
news_sents = brown.sents(categories='news')
fdist = FreqDist(w.lower() for w in news)
word_features = list(fdist.values())[:50]
num_sents = len(news.sents(fileid))
for i in range(num_sents):
features = {}
for word in word_features:
features[word] = int(word in news_sents[i])
print(features)