在NLTK中运行Stanford POS标记符导致Windows上“不是有效的Win32应用程序”

时间:2014-10-30 07:25:20

标签: nltk stanford-nlp pos-tagger

我试图通过以下代码在NLTK中使用stanford POS标记:

import nltk
from nltk.tag.stanford import POSTagger
st = POSTagger('E:\Assistant\models\english-bidirectional-distsim.tagger',
               'E:\Assistant\stanford-postagger.jar')
st.tag('What is the airspeed of an unladen swallow?'.split())

这是输出:

Traceback (most recent call last):
  File "E:\J2EE\eclipse\WSNLP\nlp\src\tagger.py", line 5, in <module>
    st.tag('What is the airspeed of an unladen swallow?'.split())
  File "C:\Python34\lib\site-packages\nltk\tag\stanford.py", line 59, in tag
    return self.tag_sents([tokens])[0]
  File "C:\Python34\lib\site-packages\nltk\tag\stanford.py", line 81, in tag_sents
    stdout=PIPE, stderr=PIPE)
  File "C:\Python34\lib\site-packages\nltk\internals.py", line 153, in java
    p = subprocess.Popen(cmd, stdin=stdin, stdout=stdout, stderr=stderr)
  File "C:\Python34\lib\subprocess.py", line 858, in __init__
    restore_signals, start_new_session)
  File "C:\Python34\lib\subprocess.py", line 1111, in _execute_child
    startupinfo)
OSError: [WinError 193] %1 is not a valid Win32 application

P.S。我的java home已设置好,我的java安装没问题。有人可以解释这个错误在说什么吗?这对我没有任何信息。提前谢谢。

2 个答案:

答案 0 :(得分:0)

您的Java安装看起来很拙劣或缺失。

答案 1 :(得分:0)

经过大量的试验和错误后,它才起作用:

似乎NLTK Internal无法在Windows上自动找到java二进制文件,因此我们需要按如下方式识别它:

import os
import nltk
from nltk.tag.stanford import POSTagger
os.environ['JAVA_HOME'] = r'C:\Program Files\Java\jre6\bin'
st = POSTagger('E:\stanford-postagger-2014-10-26\models\english-left3words-distsim.tagger',
               'E:\stanford-postagger-2014-10-26\stanford-postagger.jar')
st.tag(nltk.word_tokenize('What is the airspeed of an unladen swallow?'))

正如其中一位大师对我说的那样:&#34;不要忘记添加&#34; r&#34;与&#34; \&#34;在字符串中。&#34;