我尝试使用nltk的sent_tokenize()
,所以我已经下载了
import nltk
nltk.download("stopwords")
nltk.download("punkt")
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
# tokenize sentences
sentences = [sent for sent in sent_tokenize(data, "russian")]
但它返回了我
LookupError:
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
import nltk
nltk.download('punkt')
Searched in:
- '/Users/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/Library/Frameworks/Python.framework/Versions/3.6/nltk_data'
- '/Library/Frameworks/Python.framework/Versions/3.6/share/nltk_data'
- '/Library/Frameworks/Python.framework/Versions/3.6/lib/nltk_data'
但是我不明白为什么,我已经下载了它。
我尝试使用nltk.download()
,但是我没有太多的RAM,因此它工作太慢。
我该在那里修改些什么来解决它?
答案 0 :(得分:0)
您可以尝试
nltk.download("popular")
它下载NLTK的最基本工具,例如令牌生成器和停用词