使用python textblob库标记器时出错

时间:2016-03-03 18:58:36

标签: python python-2.7 nltk textblob

我让textblob库工作了一段时间,但决定安装(使用easy_install)一个额外的库(page here)声称更快,更准确的标记。

我无法让它工作,所以我卸载了它,但它似乎搞乱了TextBlob中的标记功能。我已经使用pip和easy_install多次卸载并重新安装了nltk和TextBlob,并确保它们是最新的。

以下是生成错误的简单脚本示例:

from textblob import TextBlob

blob = TextBlob("This is a sentence")
print repr(blob.tags)

并打印错误:

    Traceback (most recent call last):
  File "tesst.py", line 5, in <module>
    print repr(blob.tags)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\decorators.py", line 24, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\blob.py", line 445, in pos_tags
    for word, t in self.pos_tagger.tag(self.raw)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\decorators.py", line 35, in decorated
    return func(*args, **kwargs)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\en\taggers.py", line 34, in tag
    tagged = nltk.tag.pos_tag(text)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\tag\__init__.py", line 110, in pos_tag
    tagger = PerceptronTagger()
  File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\tag\perceptron.py", line 141, in __init__
    self.load(AP_MODEL_LOC)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\tag\perceptron.py", line 209, in load
    self.model.weights, self.tagdict, self.classes = load(loc)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\data.py", line 801, in load
    opened_resource = _open(resource_url)
  File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\data.py", line 924, in _open
    return urlopen(resource_url)
  File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 431, in open
    response = self._open(req, data)
  File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 454, in _open
    'unknown_open', req)
  File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 1265, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib2.URLError: <urlopen error unknown url type: c>

您可以看到该错误实际上提到了感知器标记器。有没有办法更彻底地删除备用标记器中可能存在的任何引用?

另请注意,只有“标签”功能受到影响。

2 个答案:

答案 0 :(得分:2)

这似乎是nltk版本3.2的问题。直到它在发布中修复,你可以使用这个hack: NLTK v3.2: Unable to nltk.pos_tag()

答案 1 :(得分:0)

我发现了为什么我在使用ap tagger时遇到了麻烦。 My issue is solved here.更具体地说,通过注释“另一个选项是从textblob.packages import nltk”安装nltk然后更改“到”import nltk“[在taggers.py]文件中。”

(请注意,这与上面的错误消息不对应:错误是在没有安装aptagger的情况下出现的。我在安装时遇到了另一个错误,这是一个解决方案。 )