NLTK:应该使用send_tokenize下载什么

时间:2018-10-31 22:11:59

标签: python nltk

我尝试使用nltk的sent_tokenize(),所以我已经下载了

import nltk
nltk.download("stopwords")
nltk.download("punkt")

from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords

# tokenize sentences
sentences = [sent for sent in sent_tokenize(data, "russian")]

但它返回了我

LookupError: 
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:

  import nltk
  nltk.download('punkt')

  Searched in:
- '/Users/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/Library/Frameworks/Python.framework/Versions/3.6/nltk_data'
- '/Library/Frameworks/Python.framework/Versions/3.6/share/nltk_data'
- '/Library/Frameworks/Python.framework/Versions/3.6/lib/nltk_data'

但是我不明白为什么,我已经下载了它。 我尝试使用nltk.download(),但是我没有太多的RAM,因此它工作太慢。 我该在那里修改些什么来解决它?

1 个答案:

答案 0 :(得分:0)

您可以尝试

nltk.download("popular")

它下载NLTK的最基本工具,例如令牌生成器和停用词