我已经使用Gensim库训练了法语的FastText模型。 突然,这个训练有素的模型没有被加载到内存中。
我正在使用以下代码:-
from gensim.models import FastText
fname = "filename"
model = FastText.load(fname)
并引发以下错误:-
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/site-packages/gensim/models/fasttext.py", line 1070, in load
model = super(FastText, cls).load(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/gensim/models/base_any2vec.py", line 1244, in load
model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/gensim/models/base_any2vec.py", line 603, in load
return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
File "/usr/local/lib/python3.7/site-packages/gensim/utils.py", line 426, in load
obj = unpickle(fname)
File "/usr/local/lib/python3.7/site-packages/gensim/utils.py", line 1384, in unpickle
return _pickle.load(f, encoding='latin1')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 14072054: invalid start byte
由于该模型是针对大型数据集训练的,因此有什么方法可以恢复/加载该模型?
答案 0 :(得分:0)
此错误意味着您存储在模型中的文本不符合here所述的UPDATE delivery_note_entry dne
set dne.base_price =(
select pp.price
from product_price pp
join delivery_note dn on dn.id=dne.delivery_note_id
join customer c on dn.customer_id = c.id
join customer_category cc on cc.id = c.customer_category_id
where
dn.creation_date between '2020-08-28' and '2020-08-29'
)
where dne.product_id = pp.product_id and
编码。
使用已经训练好的模型的解决方案是在运行模型时设置utf-8
标志:
unicode_errors
但是,这将导致忽略所讨论的单词/字符,这可能不是理想的选择。
更好的方法是使用符合from gensim.models import FastText
fname = "filename"
model = FastText.load(fname, unicode_errors='ignore')
的设置来重新训练模型,但这需要重新训练。