Question

这是我的代码，只是用nltk执行一些标记化。

import nltk
from nltk.corpus import stopwords
tokens = nltk.word_tokenize(doc, language='english')
# remove all the stopwords
filtered = [w for w in tokens if (w not in stopwords.words('english')) and (w.isalnum())]

我已经下载了punkt包。我还尝试将正确的文件夹复制并粘贴到错误消息所说的搜索位置。这是我在其他类似问题中看到的错误。

资源u'tokenizers / punkt / english.pickle'未找到。
请使用NLTK Downloader获取资源：＆gt;＆gt;＆gt;

nltk.download（）搜索：

- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''

我甚至尝试重新安装整个nltk和软件包，但它没有用。有关环境的有用信息： - 通过Pycharm IDE的终端运行 -operting系统：Ubuntu 15 -nltk使用pip安装 -nltk_data安装在默认位置/ home / user / nltk_data

请不要告诉我使用nltk.download（'punkt'），因为我有它。谢谢你的帮助。

Answer 1

您必须安装nltk-punkt才能令牌化。

如何？
1. 打开终端。
2. 执行python命令进入python环境。
3. 执行import nltk
4. 执行nltk.download('punkt')

您的终端可能看起来像这样：

Python nltk资源u'tokenizers / punkt / english.pickle'not found bu它实际存在

1 个答案: