spaCy:尝试加载序列化文档时出错

时间:2018-08-06 15:31:31

标签: python nlp spacy

我正在尝试对spaCy文档进行序列化/反序列化(设置为Windows 7,Anaconda),并且出现错误。我还找不到任何解释。这是一段代码及其产生的错误:

import spacy
nlp = spacy.load('en')
text = 'This is a test.'
doc = nlp(text)
fout = 'test.spacy' # <-- according to the API for Doc.to_disk(), this needs to be a directory (but for me, spaCy writes a file)
doc.to_disk(fout)
doc.from_disk(fout)
Traceback (most recent call last):

  File "<ipython-input-7-aa22bf1b9689>", line 1, in <module>
    doc.from_disk(fout)

  File "doc.pyx", line 763, in spacy.tokens.doc.Doc.from_disk

  File "doc.pyx", line 806, in spacy.tokens.doc.Doc.from_bytes

ValueError: [E033] Cannot load into non-empty Doc of length 5.

我也曾尝试创建一个新的Doc对象并从中加载,如spaCy docs中的示例(“示例:保存并加载文档”)所示,这会导致不同的错误:< / p>

from spacy.tokens import Doc
from spacy.vocab import Vocab

new_doc = Doc(Vocab()).from_disk(fout)
Traceback (most recent call last):

  File "<ipython-input-16-4d99a1199f43>", line 1, in <module>
    Doc(Vocab()).from_disk(fout)

  File "doc.pyx", line 763, in spacy.tokens.doc.Doc.from_disk

  File "doc.pyx", line 838, in spacy.tokens.doc.Doc.from_bytes

  File "stringsource", line 646, in View.MemoryView.memoryview_cwrapper

  File "stringsource", line 347, in View.MemoryView.memoryview.__cinit__

ValueError: buffer source array is read-only

编辑:

如答复中所指出,提供的路径应为目录。但是,第一个代码段将创建一个文件。将其更改为不存在的目录路径无济于事,因为spaCy仍会创建文件。尝试写入现有目录也会导致错误:

fout = 'data'

doc.to_disk(fout) Traceback (most recent call last):

  File "<ipython-input-8-6c30638f4750>", line 1, in <module>
    doc.to_disk(fout)

  File "doc.pyx", line 749, in spacy.tokens.doc.Doc.to_disk

  File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1161, in open
    opener=self._opener)

  File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1015, in _opener
    return self._accessor.open(self, flags, mode)

  File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 387, in wrapped
    return strfunc(str(pathobj), *args)

PermissionError: [Errno 13] Permission denied: 'data'

Python可以通过标准文件操作(open / read / write)在此位置进行编写。

尝试使用Path对象会产生相同的结果:

from pathlib import Path

import os

fout = Path(os.path.join(os.getcwd(), 'data'))

doc.to_disk(fout)
Traceback (most recent call last):

  File "<ipython-input-17-6c30638f4750>", line 1, in <module>
    doc.to_disk(fout)

  File "doc.pyx", line 749, in spacy.tokens.doc.Doc.to_disk

  File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1161, in open
    opener=self._opener)

  File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1015, in _opener
    return self._accessor.open(self, flags, mode)

  File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 387, in wrapped
    return strfunc(str(pathobj), *args)

PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Username\\workspace\\data'

有什么想法可能会发生这种情况吗?

1 个答案:

答案 0 :(得分:1)

doc.to_disk(fout)

必须

  

目录的路径,如果目录不存在,将创建该路径。   路径可以是字符串或类似路径的对象。

作为https://spacy.io/api/doc中的spaCy状态的文档

尝试将fout更改为目录,可能会成功。

编辑: spacy文档中的示例:

doc.to_disk

doc.to_disk('/path/to/doc')

doc.from_disk

from spacy.tokens import Doc
from spacy.vocab import Vocab
doc = Doc(Vocab()).from_disk('/path/to/doc')