Question

我正在尝试运行以下代码行：

import os
os.environ['JAVAHOME'] = 'path/to/java.exe'
os.environ['STANFORD_PARSER'] = 'path/to/stanford-parser.jar'
os.environ['STANFORD_MODELS'] = 'path/to/stanford-parser-3.8.0-models.jar'

from nltk.parse.stanford import StanfordDependencyParser
dep_parser = StanfordDependencyParser(model_path="path/to/englishPCFG.ser.gz")
sentence = "sample sentence ..."

# Dependency Parsing:
print("Dependency Parsing:")
print([parse.tree() for parse in dep_parser.raw_parse(sentence)])

并在该行：

print([parse.tree() for parse in dep_parser.raw_parse(sentence)])

我遇到以下问题：

追踪（最近一次通话）： File＆＃34; C：/Users/Norbert/PycharmProjects/untitled/StanfordDependencyParser.py" ;,第21行，in print（[parse.tree（）for deprse in dep_parser.raw_parse（sentence）]）文件＆＃34; C：\ Users \ Norbert \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ nltk \ parse \ stanford.py＆＃34;，第134行，在raw_parse中 return next（self.raw_parse_sents（[sentence]，verbose））文件＆＃34; C：\ Users \ Norbert \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ nltk \ parse \ stanford.py＆＃34;，第152行，在raw_parse_sents中 return self._parse_trees_output（self._execute（cmd，＆＃39; \ n＆＃39; .join（句子），详细））文件＆＃34; C：\ Users \ Norbert \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ nltk \ parse \ stanford.py＆＃34;，第218行，在_execute中 stdout = PIPE，stderr = PIPE）文件＆＃34; C：\ Users \ Norbert \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ nltk \ internals.py＆＃34;，第135行，在java中打印（_decode_stdoutdata（错误））文件＆＃34; C：\ Users \ Norbert \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ nltk \ internals.py＆＃34;，第737行，在_decode_stdoutdata中 return stdoutdata.decode（encoding） UnicodeDecodeError：＆＃39; utf-8＆＃39;编解码器不能解码位置3097中的字节0xac：无效的起始字节

知道可能出现什么问题吗？我甚至没有处理任何非utf-8文本。

Answer 1

我可以通过这样做打印一些东西，也许不是你想要的但是开始。

print("Dependency Parsing:")
result = dependency_parser.raw_parse(sentence)
#print (next(result))
dep = next(result)
print (list(dep.triples()))

取消对该行的注释 - ＆gt;如果要查看整个输出，请打印（下一个（结果））。

带有NLTK的stanford-dependency解析器：UnicodeDecodeError：

1 个答案: