Question

我有一系列单词，我想用nltk消除它的所有停用词。下面给出了相同的代码片段：

#tokensgenerated has the sequence of words
for word in tokensgenerated:
    if(word not in nltk.corpus.stopwords.words('english')):
          #do something with the word

然而，我遇到了运行时错误。

“除了LookupError：raise e”

我导入了nltk。

我缺少什么？

Answer 1

首先下载并确保已下载stopwords，请参阅http://www.nltk.org/data：

>>> import nltk
>>> packages = ['stopwords']
>>> downloader.download(packages)
>>>
>>> stop = stopwords.words('english')
>>> sent = 'this is a foobar sentence'.split()
>>> [word for word in sent if word not in stop]
['foobar', 'sentence']

无法从一系列单词中使用nltk消除停用词

1 个答案: