无法从一系列单词中使用nltk消除停用词

时间:2014-03-15 22:12:25

标签: python nltk stop-words

我有一系列单词,我想用nltk消除它的所有停用词。下面给出了相同的代码片段:

#tokensgenerated has the sequence of words
for word in tokensgenerated:
    if(word not in nltk.corpus.stopwords.words('english')):
          #do something with the word

然而, 我遇到了运行时错误。

  

“除了LookupError:raise e”

我导入了nltk。

我缺少什么?

1 个答案:

答案 0 :(得分:0)

首先下载并确保已下载stopwords,请参阅http://www.nltk.org/data

>>> import nltk
>>> packages = ['stopwords']
>>> downloader.download(packages)
>>>
>>> stop = stopwords.words('english')
>>> sent = 'this is a foobar sentence'.split()
>>> [word for word in sent if word not in stop]
['foobar', 'sentence']