我正在尝试nltk tutorial。
problem I was facing需要下载各种语料库。在所有解决方案都无法解决问题之后,我面临着使用nltk.download()
下载nltk语料库的问题,我采取了here所述的步骤。
我开始从this page下载任何示例所需的语料库,并将其放在目录D:\nltk_data\corpora
中。我能够尝试各种例子。但是在一个例子中我得到了错误:
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
所以我从同一页面下载了punkt并复制粘贴在同一个目录中。但它没有奏效。还试图像其他语料库一样from nltk.corpus import punkt
。但没用。它说Unresolved import: punkt
punkt与其他语料库的一个区别是它包含pickle文件而不是文本文件,就像其他语料库一样。我该如何解决这个问题?
代码:
import nltk;
from nltk.corpus import gutenberg
for fileid in gutenberg.fileids():
num_chars = len(gutenberg.raw(fileid))
num_words = len(gutenberg.words(fileid))
num_sents = len(gutenberg.sents(fileid))
num_vocab = len(set(w.lower() for w in gutenberg.words(fileid)))
print(round(num_chars/num_words), round(num_words/num_sents), round(num_words/num_vocab), fileid)
错误:
Traceback (most recent call last):
File "D:\Mahesh\workspaces\pyworkspace\nltkdemo\chp2\chp2.py", line 8, in <module>
num_sents = len(gutenberg.sents(fileid))
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\corpus\reader\util.py", line 233, in __len__
for tok in self.iterate_from(self._toknum[-1]): pass
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\corpus\reader\util.py", line 296, in iterate_from
tokens = self.read_block(self._stream)
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\corpus\reader\plaintext.py", line 129, in _read_sent_block
for sent in self._sent_tokenizer.tokenize(para)])
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\data.py", line 984, in __getattr__
self.__load()
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\data.py", line 976, in __load
resource = load(self._path)
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\data.py", line 836, in load
opened_resource = _open(resource_url)
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\data.py", line 954, in _open
return find(path_, path + ['']).open()
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\data.py", line 675, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource [93mpunkt[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('punkt')
[0m
Searched in:
- 'C:\\Users\\593932/nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- 'D:\\Softwares\\python\\WinPython-64bit-3.4.4.4Qt5\\python-3.4.4.amd64\\nltk_data'
- 'D:\\Softwares\\python\\WinPython-64bit-3.4.4.4Qt5\\python-3.4.4.amd64\\share\\nltk_data'
- 'D:\\Softwares\\python\\WinPython-64bit-3.4.4.4Qt5\\python-3.4.4.amd64\\lib\\nltk_data'
- 'C:\\Users\\Mahesha999\\AppData\\Roaming\\nltk_data'
- ''
**********************************************************************
错误似乎发生在第8行:num_sents = len(gutenberg.sents(fileid))