Question

以下内容是否从数据集中读取 而不将整个内容 一次性加载到内存中[整个内容将不适合内存]并获取数据集的大小没有在python中使用h5py加载数据？如果没有，怎么样？

{{1}}

感谢。

Answer 1

get（或索引）获取文件上对数据集的引用，但不加载任何数据。

In [789]: list(f.keys())
Out[789]: ['dset', 'dset1', 'vset']
In [790]: d=f['dset1']
In [791]: d
Out[791]: <HDF5 dataset "dset1": shape (2, 3, 10), type "<f8">
In [792]: d.shape         # shape of dataset
Out[792]: (2, 3, 10)
In [793]: arr=d[:,:,:5]    # indexing the set fetches part of the data
In [794]: arr.shape
Out[794]: (2, 3, 5)
In [795]: type(d)
Out[795]: h5py._hl.dataset.Dataset
In [796]: type(arr)
Out[796]: numpy.ndarray

d数据集就像数组一样，但实际上并不是numpy数组。

使用以下内容获取整个数据集：

In [798]: arr = d[:]
In [799]: type(arr)
Out[799]: numpy.ndarray

究竟如何读取文件以获取你的文件取决于切片，数据布局，分块以及其他一般不受你控制的事情，不应该担心。

另请注意，在阅读一个数据集时，我没有加载其他数据集。同样适用于团体。

http://docs.h5py.org/en/latest/high/dataset.html#reading-writing-data

从大文件中读取而不使用h5py将整个内容加载到内存中

1 个答案: