如何将自定义类对象存储到spaCy.doc中并使用`doc.to_disk`?

时间:2020-01-15 03:36:19

标签: python spacy

我想将我的类对象存储到spacy.Doc中并用doc.to_disk保存,如下所示:

from spacy.tokens import Doc
from spacy.vocab import Vocab
from dataclasses import dataclass


@dataclass
class Foo:
    a: int


doc = Doc(Vocab(), [])
doc.user_data["foo"] = Foo(1)
doc.to_disk("/tmp/fooo")

但是此代码会引发错误:

TypeError: can not serialize 'Foo' object

我该怎么办?

1 个答案:

答案 0 :(得分:1)

对于此线程here,您应该尝试以下解决方法:

    def remove_unserializable_results(doc):
        doc.user_data = {}
        for x in dir(doc._):
            if x in ['get', 'set', 'has']: continue
            setattr(doc._, x, None)
        for token in doc:
            for x in dir(token._):
                if x in ['get', 'set', 'has']: continue
                setattr(token._, x, None)
        return doc

nlp.add_pipe(remove_unserializable_results, last=True)