Question

尝试上载预训练的word2vec文件时收到错误   （使用fasttext编译）使用Gensim。文件扩展名为“ .vec”，并且   可以在这里找到：   http://89.38.230.23/word_embeddings/we/corola.300.20.vec.zip

到目前为止，我已经尝试过：选项1：gensim.models的KeyedVectors   选项2：FastText包装器

#Option 1
    from gensim.models import KeyedVectors
    model = KeyedVectors.load_word2vec_format('Word_embeddings/corola.300.20.vec', binary=True)
######

#Option 2
    from gensim.models.wrappers import FastText
    model = FastText.load_word2vec_format('Word_embeddings/corola.300.20.vec')

错误选项1：UnicodeDecodeError：'utf-8'编解码器无法解码字节   位置0：0x9b：无效的起始字节

“弃用错误”选项2：DeprecationWarning：已弃用。采用   gensim.models.KeyedVectors.load_word2vec_format。

我需要正确的方法来成功上传word2vec文件，   使用gensim。

谢谢。

Answer 1

有时候，最好使用unicode_errors='ignore'参数，因为单词嵌入文件中可能有错误。只需尝试：

model = KeyedVectors.load_word2vec_format('Word_embeddings/corola.300.20.vec', binary=True, unicode_errors='ignore')

Word2Vec：使用Gensim上传经过预训练的word2vec文件时收到错误

1 个答案: