我有一系列单词,我想用nltk消除它的所有停用词。下面给出了相同的代码片段:
#tokensgenerated has the sequence of words
for word in tokensgenerated:
if(word not in nltk.corpus.stopwords.words('english')):
#do something with the word
然而, 我遇到了运行时错误。
“除了LookupError:raise e”
我导入了nltk。
我缺少什么?
答案 0 :(得分:0)
首先下载并确保已下载stopwords
,请参阅http://www.nltk.org/data:
>>> import nltk
>>> packages = ['stopwords']
>>> downloader.download(packages)
>>>
>>> stop = stopwords.words('english')
>>> sent = 'this is a foobar sentence'.split()
>>> [word for word in sent if word not in stop]
['foobar', 'sentence']