StanfordPOSTagger不使用NLTK

时间:2016-04-18 13:59:29

标签: java python-2.7 nltk stanford-nlp pos-tagger

我无法在装有Java 8的Mac上使用NLTK 3.2.1获取最新的StanfordPOSTagger。我找到了其他一些非常类似问题的线程,但没有一个解决方案对我有用。下面是我尝试标记句子时的输出:

>>> from nltk.tag.stanford import StanfordPOSTagger
>>> st = StanfordPOSTagger('wsj-0-18-left3words-distsim.tagger')
>>> st.tag(nltk.tokenize.word_tokenize("This is a test"))
Exception in thread "main" java.lang.UnsupportedClassVersionError: edu/stanford/nlp/tagger/maxent/MaxentTagger : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:637)
at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)


Traceback (most recent call last):
File "<pyshell#44>", line 1, in <module>
st.tag(nltk.tokenize.word_tokenize("This is a test"))
File "/Library/Python/2.7/site-packages/nltk/tag/stanford.py", line 71, in tag
return sum(self.tag_sents([tokens]), [])
File "/Library/Python/2.7/site-packages/nltk/tag/stanford.py", line 94, in tag_sents
stdout=PIPE, stderr=PIPE)
File "/Library/Python/2.7/site-packages/nltk/internals.py", line 134, in java
raise OSError('Java command failed : ' + str(cmd))
OSError: Java command failed : ['/Library/Java/Home/bin/java', '-mx1000m', '-cp', '/Users/johntorr/VirtualAssistantProject/stanford-postagger/stanford-postagger-3.6.0-javadoc.jar:/Users/johntorr/VirtualAssistantProject/stanford-postagger/stanford-postagger-3.6.0-sources.jar:/Users/johntorr/VirtualAssistantProject/stanford-postagger/stanford-postagger-3.6.0.jar:/Users/johntorr/VirtualAssistantProject/stanford-postagger/stanford-postagger.jar:/Users/johntorr/VirtualAssistantProject/stanford-postagger/lib/slf4j-api.jar:/Users/johntorr/VirtualAssistantProject/stanford-postagger/lib/slf4j-simple.jar', 'edu.stanford.nlp.tagger.maxent.MaxentTagger', '-model', '/Users/johntorr/VirtualAssistantProject/stanford-postagger/models/wsj-0-18-left3words-distsim.tagger', '-textFile', '/var/folders/gy/bw2lj_wj79x9vl1l3n3ccg980000gn/T/tmp6yV_lP', '-tokenize', 'false', '-outputFormatOptions', 'keepEmptySentences', '-encoding', 'utf8']

在我的.bash_profile文件中,我添加了以下两行:

export CLASSPATH={CLASSPATH}:/Users/johntorr/VirtualAssistantProject/stanford-postagger/stanford-postagger.jar
export STANFORD_MODELS=/Users/johntorr/VirtualAssistantProject/stanford-postagger/models
export JAVA_HOME=/Library/Java/Home
export PATH=$PATH:$JAVA_HOME/bin/java

似乎还有其他几个地方有Java主文件夹和java可执行文件,但我已经尝试过使用它们并且没有任何作用。我也尝试过这里的解决方案:https://gist.github.com/alvations/e1df0ba227e542955a8a由一些人在不同的帖子中发布,但这也没有用。如果有人能帮助我解决这个问题,我将非常感激!

1 个答案:

答案 0 :(得分:0)

感谢您的所有建议Alvas。实际上,我设法通过安装2014年的以下旧版POS标记来解决问题仍然具备所有必要条件:http://nlp.stanford.edu/software/stanford-postagger-full-2014-01-04.zip 显然,最新的斯坦福解析器存在完全相同的问题,因此人们也使用了2014版本。