加载NLTK perceptron标记时出现IOError

时间:2016-04-27 08:28:25

标签: python nltk ironpython

代码很简单如下

import nltk
nltk.data.path.append(r"E:\nltk_data")
nltk.pos_tag(["hello"])

错误是

File "C:\Program Files (x86)\IronPython
2.7\lib\site-packages\nltk\tag\__init__.py", line 110, in pos_tag
    tagger = PerceptronTagger()   File "C:\Program Files (x86)\IronPython 2.7\lib\site-packages\nltk\tag\perceptron.py", line 141, in __init__
    self.load(AP_MODEL_LOC)   File "C:\Program Files (x86)\IronPython 2.7\lib\site-packages\nltk\tag\perceptron.py", line 209, in load
    self.model.weights, self.tagdict, self.classes = load(loc)   File "C:\Program Files (x86)\IronPython
2.7\lib\site-packages\nltk\data.py", line 800, in load
    # Load the resource.   File "C:\Program Files (x86)\IronPython 2.7\lib\site-packages\nltk\data.py", line 921, in _open
    # urllib might not use mode='rb', so handle this one ourselves:   File "C:\Program Files (x86)\IronPython
2.7\lib\site-packages\nltk\data.py", line 603, in find
    if zipfile is None:   File "C:\Program Files (x86)\IronPython 2.7\Lib\nturl2path.py", line 26, in url2pathname
    raise IOError, error IOError: Bad URL: /C|/E|/nltk_data/taggers/averaged_perceptron_tagger/averaged_perceptron_tagger.pickle

为什么网址会变为/C|/E|/nltk_data/tagg...,为什么首先要调用url2pathname?我已经在Windows上,我提供的网址是Windows风格的网址。

2 个答案:

答案 0 :(得分:2)

我不得不深入研究代码并最终找到问题所在。 Nltk用if sys.platform.startswith('win'):确定操作系统(顺便提一下非常专业的方式确定)

但是,如果您使用IronPython,则您的平台为CLI

我怀疑这给IronPython用户带来了很多问题。因此,下次任何Python包的行为都像它的unix版本一样,只需检查此代码的模块。

修改:我的修复方法是将检查代码替换为sys.platform.startswith('win') or sys.platform.startswith('cli')

答案 1 :(得分:-1)

您的代码正在转发\n

\替换为\\

import nltk
nltk.data.path.append(r"E:\\nltk_data")
nltk.pos_tag(["hello"])

您可以参考以下问题:What exactly do "u" and "r" string flags do in Python, and what are raw string literals?

有关原始字符串文字如何工作的更多信息。