我正在使用python nltk的malt解析器。我已成功下载了培训数据并更新了最新的nltk。当我调用麦芽解析器时,它会给我一个断言错误。下面是python中包含回溯的代码。
mp = MaltParser("C:/Users/mustufain/Desktop/Python Files/maltparser-1.8.1","C:/Users/mustufain/Desktop/Python Files/maltparser-1.7.2",additional_java_args=['-Xmx512m'])
Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
mp = MaltParser("C:/Users/mustufain/Desktop/Python Files/maltparser-1.8.1","C:/Users/mustufain/Desktop/Python Files/maltparser-1.7.2",additional_java_args=['-Xmx512m'])
File "C:\Python34\lib\site-packages\nltk\parse\malt.py", line 131, in __init__
self.malt_jars = find_maltparser(parser_dirname)
File "C:\Python34\lib\site-packages\nltk\parse\malt.py", line 72, in find_maltparser
assert malt_dependencies.issubset(_jars)
AssertionError
>>>
答案 0 :(得分:2)
TL;DR
(在 PYTHON3 !!):
import urllib.request
urllib.request.urlretrieve('http://www.maltparser.org/mco/english_parser/engmalt.poly-1.7.mco', 'C:\\Users\\mustufain\\Desktop\\engmalt.poly-1.7.mco')
urllib.request.urlretrieve('http://maltparser.org/dist/maltparser-1.8.1.zip', 'C:\\Users\\mustufain\\Desktop\\maltparser-1.8.1.zip')
zfile = zipfile.ZipFile('C:\\Users\\mustufain\\Desktop\\maltparser-1.8.1.zip')
zfile.extractall('C:\\Users\\mustufain\\Desktop\\maltparser-1.8.1\\')
然后:
from nltk.parse import malt
mp = malt.MaltParser('C:\\Users\\mustufain\\Desktop\\maltparser-1.8.1\\', "C:\\Users\\mustufain\\Desktop\\engmalt.poly-1.7.mco")
mp.parse_one('I shot an elephant in my pajamas .'.split()).tree()
答案 1 :(得分:1)
如果所有下载和环境变量设置都正确完成,很可能就是nltk.parse.malt.py
中{/ 3}}分割文件/目录路径的方式,https://github.com/nltk/nltk/blob/develop/nltk/parse/malt.py#L69分割专门针对linux的目录和文件名:< / p>
def find_maltparser(parser_dirname):
"""
A module to find MaltParser .jar file and its dependencies.
"""
if os.path.exists(parser_dirname): # If a full path is given.
_malt_dir = parser_dirname
else: # Try to find path to maltparser directory in environment variables.
_malt_dir = find_dir(parser_dirname, env_vars=('MALT_PARSER',))
# Checks that that the found directory contains all the necessary .jar
malt_dependencies = ['','','']
_malt_jars = set(find_jars_within_path(_malt_dir))
_jars = set(jar.rpartition('/')[2] for jar in _malt_jars)
malt_dependencies = set(['log4j.jar', 'libsvm.jar', 'liblinear-1.8.jar'])
assert malt_dependencies.issubset(_jars)
assert any(filter(lambda i: i.startswith('maltparser-') and i.endswith('.jar'), _jars))
return list(_malt_jars)
错误已修复,并且正在合并https://github.com/nltk/nltk/pull/1292
更改此行:
_jars = set(jar.rpartition('/')[2] for jar in _malt_jars)
这应该解决你的问题=)
_jars = set(os.path.split(jar)[1] for jar in _malt_jars)
对于与代码本身无关的答案,但您如何设置环境变量或下载并保存麦芽解析器目录或文件,请参阅https://github.com/nltk/nltk/issues/1294