我是python和NLTK的新手。我想在这里做单词标记化和POS标记。我在我的Ubuntu 14.04中安装了Nltk 3.0,默认python 2.7.6。首先我尝试对一个简单的句子进行标记化但是我收到一个错误,告诉他" BadZipfile:文件不是一个zip文件"。如何解决这个问题????
..还有一个疑问......呃。我给了道路" / usr / share / nltk_data"当我安装Nltk数据时(使用命令行)。由于某些错误,某些pakages无法安装。但是当我使用命令" nltk.data.path"时,它显示了其他路径。其他路径实际上是无效的..为什么???
我有1000个文本文件。如何将这么多文件的标记化和POS标记编码一起作为python中的输入...我不知道..请帮助我......
我在python解释器中使用命令的方式在下面以相同的顺序给出
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> import nltk
>>> nltk.data.path
['/home/ubuntu/nltk_data', '/usr/share/nltk_data', '/usr/local/share/nltk_data', '/usr/lib/nltk_data', '/usr/local/lib/nltk_data']
>>> from nltk import pos_tag, word_tokenize
>>> sentence = "Hello my name is Derek. I live in Salt Lake city."
>>> sentence
'Hello my name is Derek. I live in Salt Lake city.'
>>> word_tokenize(sentence)
Traceback (most recent call last):
File "<pyshell#5>", line 1, in <module>
word_tokenize(sentence)
File "/usr/local/lib/python2.7/dist-packages/nltk/tokenize/__init__.py", line 93, in word_tokenize
return [token for sent in sent_tokenize(text)
File "/usr/local/lib/python2.7/dist-packages/nltk/tokenize/__init__.py", line 81, in sent_tokenize
tokenizer = load('tokenizers/punkt/english.pickle')
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 774, in load
opened_resource = _open(resource_url)
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 888, in _open
return find(path_, path + ['']).open()
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 605, in find
return find(modified_name, paths)
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 592, in find
return ZipFilePathPointer(p, zipentry)
File "/usr/local/lib/python2.7/dist-packages/nltk/compat.py", line 380, in _decorator
return init_func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 449, in __init__
zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile))
File "/usr/local/lib/python2.7/dist-packages/nltk/compat.py", line 380, in _decorator
return init_func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 946, in __init__
zipfile.ZipFile.__init__(self, filename)
File "/usr/lib/python2.7/zipfile.py", line 770, in __init__
self._RealGetContents()
File "/usr/lib/python2.7/zipfile.py", line 811, in _RealGetContents
raise BadZipfile, "File is not a zip file"
BadZipfile: File is not a zip file
>>>
提前致谢.....
答案 0 :(得分:2)
你显然还没有download_corpora.py
(成功)。
答案 1 :(得分:0)
我已经解决了与在stackoverflow上关注此问题相同的问题。
基本上,检查您的NLTK版本。 如果高于v3.2,请使用以下命令行:
nltk.download('averaged_perceptron_tagger')
对我有用。