Question

我已将许多数据文件保存为.npz以节省存储空间（savez_compressed）。每个文件都保存为一个数组，因此在使用numpy load函数时，它会将键返回到包含该数组的字典。

如何快速将此数组存储为数组而不是字典。

例如

data = []
datum = np.load('file.npz')
key = datum.keys()[0]
data.append([datum[key]])

在对此进行分析时，我的代码大部分时间都使用get方法来处理字典。

如果它保存为npy文件，则不需要get方法，速度更快。

data = []
data.append([np.load('file.npz')])

我认为通过加载文件，两种情况下数据已经在内存中。 savez_compressed似乎没有选择只保存为数组。这有可能还是有办法加快加载速度？

Answer 1

np.load使用np.lib.npyio.NpzFile类加载npz个文件。它的文件是：

NpzFile(fid)

A dictionary-like object with lazy-loading of files in the zipped
archive provided on construction.

`NpzFile` is used to load files in the NumPy ``.npz`` data archive
format. It assumes that files in the archive have a ".npy" extension,
other files are ignored.

The arrays and file strings are lazily loaded on either
getitem access using ``obj['key']`` or attribute lookup using
``obj.f.key``. A list of all files (without ".npy" extensions) can
be obtained with ``obj.files`` and the ZipFile object itself using
``obj.zip``.

我认为最后一段回答了你的时间问题。在您执行字典get之前，不会加载数据。所以它不仅仅是一个内存中的字典查找 - 它是一个文件加载（具有解压缩）。

Python字典查找很快 - 解释器在访问对象的属性时一直在做。当简单地管理命名空间时。

快速访问numpy npz数据

1 个答案: