我正在尝试在我的脚本中使用StanfordTokenizer tokenize()
,但似乎找不到我订购的CLASSPATH
中的jar。
我尝试将_JAR = 'stanford-postagger.jar'
更改为C:\Program Files\JetBrains\PyCharm 2017.1.2\stanford-postagger-2016-10-31\'stanford-postagger.jar'
,但似乎无效。
这是我的剧本:
from nltk.tokenize.stanford import StanfordTokenizer
def AnalyzeText(text):
t = StanfordTokenizer(path_to_jar='C:\Program Files\JetBrains\PyCharm 2017.1.2\stanford-postagger-2016-10-31\stanford-postagger.jar')
return t.tokenize(text)
我确实更新了nltk
。我也按你看到的那样下载了stanford-postagger
。我无法理解问题所在。
答案 0 :(得分:0)
在Python3中,执行一次:
import urllib.request
import zipfile
# Download the file.
urllib.request.urlretrieve(r'http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip', r'C:\Program Files\JetBrains\PyCharm 2017.1.2\stanford-postagger-full-2015-04-20.zip')
# Initialize a zipfile object.
zfile = zipfile.ZipFile(r'C:\Program Files\JetBrains\PyCharm 2017.1.2\stanford-pos-2015-04-20.zip')
# Unzip the file.
zfile.extractall(r'C:\Program Files\JetBrains\PyCharm 2017.1.2\stanford-pos')
然后:
from nltk.tokenize.stanford import StanfordTokenizer
# First we set the direct path to the NER Tagger.
_path_to_jar = r'C:\Program Files\JetBrains\PyCharm 2017.1.2\stanford-pos\stanford-postagger-full-2015-04-20\stanford-postagger.jar'
# Then we initialize the NLTK's Stanford Tokenizer.
st = StanfordTokenizer(path_to_jar= _path_to_jar)
st.tokenize(text)
答案 1 :(得分:0)
似乎ClASSPATH没问题。 NLTK需要JDK 1.8版本。
import os
java_path = "C:/Program Files/Java/jdk1.8.0_131/bin/java.exe"
os.environ['JAVAHOME'] = java_path