我按照官方NLTK wiki的指示设置了环境变量。我在第一个例子中遇到了以下错误。以下是代码:
from nltk.tokenize import StanfordSegmenter
datapath = "D:/Coding/stanford-segmenter/"
corporadict = datapath+"data/"
modelpath = datapath + "data/pku.gz"
dictpath = datapath + "data/dict-chris6.ser.gz"
segmenter = StanfordSegmenter(path_to_sihan_corpora_dict=corporadict,path_to_model=modelpath,path_to_dict=dictpath)
res = segmenter.segment(u"这是斯坦福中文分词器")
但是Python给了我以下错误。 回溯(最近一次调用最后一次):
File "D:/Video data/data_processed/ugctext/test_stanford.py", line 19, in <module>
res = segmenter.segment(u"这是斯坦福中文分词器")
File "C:\Python35\lib\site-packages\nltk\tokenize\stanford_segmenter.py", line 164, in segment
return self.segment_sents([tokens])
File "C:\Python35\lib\site-packages\nltk\tokenize\stanford_segmenter.py", line 192, in segment_sents
stdout = self._execute(cmd)
File "C:\Python35\lib\site-packages\nltk\tokenize\stanford_segmenter.py", line 211, in _execute
stdout, _stderr = java(cmd, classpath=self._stanford_jar, stdout=PIPE, stderr=PIPE)
File "C:\Python35\lib\site-packages\nltk\internals.py", line 129, in java
p = subprocess.Popen(cmd, stdin=stdin, stdout=stdout, stderr=stderr)
File "C:\Python35\lib\subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "C:\Python35\lib\subprocess.py", line 1198, in _execute_child
args = list2cmdline(args)
File "C:\Python35\lib\subprocess.py", line 751, in list2cmdline
needquote = (" " in arg) or ("\t" in arg) or not arg
TypeError: argument of type 'NoneType' is not iterable
任何人都可以帮我解决这个问题吗?谢谢!
答案 0 :(得分:0)
由于某些原因,<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input class="in" type="text" id="i1"/>
<input class="in" type="text" id="i2"/>
<input class="in" type="text" id="i3"/>
<input class="out" type="text" id="i4"/>
中的list2cmdline(args)
正在返回subprocess.py
,并且未正确处理。我猜这是[None]
中java()
来电的问题。
从here您可以看到代码已更新为2014年需要Java 8.如果您的Java版本低于此值,则可能是问题。
答案 1 :(得分:0)
可能需要java_class
参数。
例如:
segmenter = StanfordSegmenter(
java_class='edu.stanford.nlp.ie.crf.CRFClassifier',
path_to_sihan_corpora_dict=corporadict,
path_to_model=modelpath,
path_to_dict=dictpath
)