如何加载用numpy.savez_compressed创建的文件?

时间:2017-04-07 16:00:37

标签: python numpy

我使用下面定义的export_vectors保存numpy数组。在这个函数中,我加载由空格分隔的字符串值,然后将它们存储为numpy数组中的浮点数。

def export_vectors(vocab, input_filename, output_filename, dim):
    embeddings = np.zeros([len(vocab), dim])
    with open(input_filename) as f:
        for line in f:
            line = line.strip().split(' ')
            word = line[0]
            embedding = line[1:]
            if word in vocab:
                word_idx = vocab[word]
                embeddings[word_idx] = np.asarray(embedding).astype(float)

    np.savez_compressed(output_filename, embeddings=embeddings)

此处embeddingsndarray float64类型。

虽然,然后在尝试加载文件时,使用:

def get_vectors(filename):
    with open(filename) as f:
        return np.load(f)["embeddings"]

尝试加载时,我收到错误:

  

文件“/usr/lib/python3.5/codecs.py”,第321行,在解码中       (结果,消耗)= self._buffer_decode(data,self.errors,final)UnicodeDecodeError:'utf-8'编解码器无法解码位置中的字节0x99   10:无效的起始字节

为什么会这样?

1 个答案:

答案 0 :(得分:4)

您可能使用open错误。 我怀疑,你需要给它一个标志,使用二进制模式,如(docs):

open(filename, 'rb')  # r: read-only; b: binary

文档解释了默认行为:Normally, files are opened in text mode, that means, you read and write strings from and to the file, which are encoded in a specific encoding.

但是你可以简单地使用文件路径本身(因为np.load可以使用file-like object, string, or pathlib.Path):

np.load(filename)  # This would be more natural
                   # as it's kind of the direct inverse of your save-code;
                   # -> no manual file-handling

一个简化的规则:所有使用通用压缩的东西都在使用二进制文件;而不是文本文件!)