NLTK无法找到stanford-postagger.jar!设置CLASSPATH环境变量

时间:2016-01-11 16:14:27

标签: python nltk stanford-nlp pos-tagger

我正在开发一个项目,要求我使用nltk和python标记标记。所以我想用这个。但想出了一些问题。 我经历了很多其他已经问过的问题和其他论坛,但我仍然无法解决这个问题。 问题是当我尝试执行以下操作时:

from nltk.tag import StanfordPOSTagger st = StanfordPOSTagger('english-bidirectional-distsim.tagger')

我得到以下内容:

    Traceback (most recent call last):

    `File "<pyshell#13>", line 1, in <module>
        st = StanfordPOSTagger('english-bidirectional-distsim.tagger')`

    `File "C:\Users\MY3\AppData\Local\Programs\Python\Python35-32\lib\site-packages\nltk-3.1-py3.5.egg\nltk\tag\stanford.py", line 131, in __init__
        super(StanfordPOSTagger, self).__init__(*args, **kwargs)`

    `File "C:\Users\MY3\AppData\Local\Programs\Python\Python35-32\lib\site-packages\nltk-3.1-py3.5.egg\nltk\tag\stanford.py", line 53, in __init__
        verbose=verbose)`

     `File "C:\Users\MY3\AppData\Local\Programs\Python\Python35-32\lib\site-packages\nltk-3.1-py3.5.egg\nltk\internals.py", line 652, in find_jar
        searchpath, url, verbose, is_regex))`

     `File "C:\Users\MY3\AppData\Local\Programs\Python\Python35-32\lib\site-packages\nltk-3.1-py3.5.egg\nltk\internals.py", line 647, in find_jar_iter
        raise LookupError('\n\n%s\n%s\n%s' % (div, msg, div))`

    LookupError: 

    ===========================================================================
      NLTK was unable to find stanford-postagger.jar! Set the CLASSPATH
      environment variable.

    ===========================================================================

我已经设定了 CLASSPATH - C:\Users\MY3\Desktop\nltk\stanford\stanford-postagger.jar 我也尝试了C:\Users\MY3\Desktop\nltk\stanford ..

STANFORD_MODELS - C:\Users\MY3\Desktop\nltk\stanford\models\

我也尝试过这样做......徒劳无功 File "C:\Python27\lib\site-packages\nltk\tag\stanford.py", line 45, in __init__ env_vars=('STANFORD_MODELS',), verbose=verbose) 但它也没有解决问题。请帮我解决这个问题。

我使用Windows 8,python 3.5和nltk 3.1

2 个答案:

答案 0 :(得分:22)

更新

最初的答案是针对 Stanford POS Tagger Version 3.6.0,Date 2015-12-09

撰写的

new Version (3.7.0, released 2016-10-31) 。这是新版本的代码:

from nltk.tag import StanfordPOSTagger
from nltk import word_tokenize

# Add the jar and model via their path (instead of setting environment variables):
jar = 'your_path/stanford-postagger-full-2016-10-31/stanford-postagger.jar'
model = 'your_path/stanford-postagger-full-2016-10-31/models/english-left3words-distsim.tagger'

pos_tagger = StanfordPOSTagger(model, jar, encoding='utf8')

text = pos_tagger.tag(word_tokenize("What's the airspeed of an unladen swallow ?"))
print(text)

原始答案

我有同样的问题(但使用OS X和PyCharm),终于让它工作了。以下是我StanfordPOSTagger Documentationalvas' work on the issue拼凑的内容(非常感谢!):

from nltk.internals import find_jars_within_path
from nltk.tag import StanfordPOSTagger
from nltk import word_tokenize

# Alternatively to setting the CLASSPATH add the jar and model via their path:
jar = '/Users/nischi/PycharmProjects/stanford-postagger-full-2015-12-09/stanford-postagger.jar'
model = '/Users/nischi/PycharmProjects/stanford-postagger-full-2015-12-09/models/english-left3words-distsim.tagger'

pos_tagger = StanfordPOSTagger(model, jar)

# Add other jars from Stanford directory
stanford_dir = pos_tagger._stanford_jar.rpartition('/')[0]
stanford_jars = find_jars_within_path(stanford_dir)
pos_tagger._stanford_jar = ':'.join(stanford_jars)

text = pos_tagger.tag(word_tokenize("What's the airspeed of an unladen swallow ?"))
print(text)

希望这有帮助。

答案 1 :(得分:1)

我将Jupyter Notebook与Pycharm一起使用。 我尝试在Pycharm中运行配置设置环境变量,但不起作用。 因此,我使用os.environ在代码中进行设置:

import os

os.environ["CLASSPATH"] = "/yourPath/stanford-parser-full-2018-10-17:yourPath/stanford-postagger-full-2018-10-16:yourPath/stanford-ner-2018-10-16"
os.environ["STANFORD_MODELS"] = "yourPath/stanford-postagger-full-2018-10-16/models:yourPath/stanford-ner-2018-10-16/classifiers"

stanford_tagger = StanfordPOSTagger('english-bidirectional-distsim.tagger')

希望这会有所帮助!