我正在尝试将名为“ ru2”的自定义模型加载到spacy中(用于npl处理)。
可以在此处找到:https://github.com/buriy/spacy-ru
问题是当我调用函数
时nlp = spacy.load('ru2')
doc = nlp(text)
我看到错误
C:\ProgramData\Anaconda3\lib\importlib\_bootstrap.py:205: RuntimeWarning: spacy.tokens.span.Span size changed, may indicate binary incompatibility. Expected 72 from C header, got 80 from PyObject
return f(*args, **kwds)
Traceback (most recent call last):
File "C://.../nlp/src/ie/main.py", line 125, in <module>
main(examp_dict['Poroshenko'])
File "C://.../nlp/src/ie/main.py", line 92, in main
nlp = spacy.load('ru2')
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\__init__.py", line 27, in load
return util.load_model(name, **overrides)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 133, in load_model
return load_model_from_path(Path(name), **overrides)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 173, in load_model_from_path
return nlp.from_disk(model_path)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\language.py", line 791, in from_disk
util.from_disk(path, deserializers, exclude)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 630, in from_disk
reader(path / key)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\language.py", line 781, in <lambda>
deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(p, exclude=["vocab"])
File "tokenizer.pyx", line 391, in spacy.tokenizer.Tokenizer.from_disk
File "tokenizer.pyx", line 432, in spacy.tokenizer.Tokenizer.from_bytes
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 606, in from_bytes
msg = srsly.msgpack_loads(bytes_data)
File "C:\ProgramData\Anaconda3\lib\site-packages\srsly\_msgpack_api.py", line 29, in msgpack_loads
msg = msgpack.loads(data, raw=False, use_list=use_list)
File "C:\ProgramData\Anaconda3\lib\site-packages\srsly\msgpack\__init__.py", line 60, in unpackb
return _unpackb(packed, **kwargs)
File "_unpacker.pyx", line 191, in srsly.msgpack._unpacker.unpackb
TypeError: unhashable type: 'list'
我正在互联网上搜索类似的问题:
但是这些解决方案都不适合我。
我使用
答案 0 :(得分:0)
这可能是因为用于生成模型的SpaCy的版本号与已安装的SpaCy的版本号不同。 (我当然不知道,只是提到它以防万一。)
答案 1 :(得分:0)
这里来自https://spacy.io/usage#troubleshooting
如果您要训练模型,将其写入磁盘并使用git对其进行版本控制,则在尝试将其加载到Windows环境中时可能会遇到此错误。发生这种情况的原因是,默认安装的Windows Git配置为在文件检出时自动将Unix样式的行尾字符(LF)转换为Windows样式的行尾字符(CRLF)(提交时则相反)。尽管这对于文本文件来说最合适,但将训练有素的模型写入磁盘时会包含一些不应进行此转换的二进制文件。当他们这样做时,您会收到上面的错误。您可以通过将core.autocrlf设置更改为“ false”或通过将.gitattributes文件提交到存储库来解决git,以告诉git哪些文件或文件夹不应该进行LF到CRLF转换,条目,例如path / to / spacy / model / ** -text。完成以上任一操作后,请再次克隆存储库。