Question

我的python代码正在接收一个字节数组，表示hdf5文件的字节。

我想将这个字节数组读取到内存中的h5py文件对象，而不先将字节数组写入磁盘。 This page说我可以打开一个内存映射文件，但它会是一个新的空文件。我想从字节数组转到内存中的hdf5文件，使用它，丢弃它，不要随时写入磁盘。

是否可以使用h5py执行此操作？（或者使用C的hdf5，如果这是唯一的方法）

Answer 1

我也非常希望能够像现有的python文件对象一样从内存中的数据创建一个h5py.File对象，但我没有看到任何迹象表明h5py.File接受一个文件对象作为其参数。 / p>

>>> f = io.BytesIO(open('test.h5').read())
>>> h5py.File(f, 'r')
AttributeError: '_io.BytesIO' object has no attribute 'encode'

h5py.File（open（'test.h5'），'r'）给出了类似的错误。我也看不到打开一个新的内存映射hdf5文件，并将一个字节流“转储”到它。

Answer 2

以下示例使用tables仍然可以读取和操作H5格式代替H5PY。

import urllib.request
import tables
url = 'https://s3.amazonaws.com/<your bucket>/data.hdf5'
response = urllib.request.urlopen(url) 
h5file = tables.open_file("data-sample.h5", driver="H5FD_CORE",
                          driver_core_image=response.read(),
                          driver_core_backing_store=0)

Answer 3

您可以使用io.BytesIO或tempfile创建h5对象，该对象显示在官方文档http://docs.h5py.org/en/stable/high/file.html#python-file-like-objects中。

File的第一个参数可以是类似Python文件的对象，例如io.BytesIO或tempfile.TemporaryFile实例。这是创建临时HDF5文件的便捷方法，例如用于测试或通过网络发送。

tempfile.TemporaryFile

>>> tf = tempfile.TemporaryFile()
>>> f = h5py.File(tf)

或io.BytesIO

"""Create an HDF5 file in memory and retrieve the raw bytes

This could be used, for instance, in a server producing small HDF5
files on demand.
"""
import io
import h5py

bio = io.BytesIO()
with h5py.File(bio) as f:
    f['dataset'] = range(10)

data = bio.getvalue() # data is a regular Python bytes object.
print("Total size:", len(data))
print("First bytes:", data[:10])

Answer 4

您可以尝试使用Binary I/O创建File对象并通过h5py读取它：

f = io.BytesIO(YOUR_H5PY_STREAM)
h = h5py.File(f,'r')

h5py可以从内存中的字节数组加载文件吗？

4 个答案: