问题

Question

我正在尝试使用python中的aa=1 bb=1包来调整数据集的大小并存储新值。我的数据集大小在每个时间都会不断增加，我想使用h5py函数附加.h5文件。但是，我的方法遇到了错误。变量resize是数据集的数组。

dset

编辑

感谢tel，我得以解决此问题。将import os import h5py import numpy as np path = './out.h5' os.remove(path) def create_h5py(path): with h5py.File(path, "a") as hf: grp = hf.create_group('left') dset = [] dset.append(grp.create_dataset('voltage', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3))) dset.append(grp.create_dataset('current', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3))) return dset if __name__ == '__main__': dset = create_h5py(path) for i in range(3): if i == 0: dset[0][:] = np.random.random(dset[0].shape) dset[1][:] = np.random.random(dset[1].shape) else: dset[0].resize(dset[0].shape[0]+10**4, axis=0) dset[0][-10**4:] = np.random.random((10**4,3)) dset[1].resize(dset[1].shape[0]+10**4, axis=0) dset[1][-10**4:] = np.random.random((10**4,3))替换为with h5py.File(path, "a") as hf:。

Answer 1

问题

不确定代码的其余部分，但是不能在返回数据集的函数中使用上下文管理器模式（即with h5py.File(foo) as bar:）。正如您在问题下的注释中指出的那样，这意味着，当您尝试访问数据集时，实际的HDF5文件将已经关闭。 h5py中的数据集对象就像文件中的实时视图一样，因此它们要求文件保持打开状态才能使用它们。因此，您会遇到错误。

解决方案

始终在托管上下文中（即，在with子句中）与文件进行交互是一个好主意。如果您的代码引发错误，则上下文管理器将（几乎始终）确保关闭文件。这有助于避免崩溃导致的任何潜在数据丢失。

在您的情况下，您可以编写自己的上下文管理器来照顾蛋糕（将数据集创建例程封装在单独的函数中）并吃掉它（与托管上下文中的HDF5文件进行交互）。你。

实际上很简单。任何实现__enter__和__exit__方法的Python对象都是有效的上下文管理器。这是完整的工作版本：

import os
import h5py
import numpy as np

path = './out.h5'
try:
    os.remove(path)
except OSError: 
    pass

class H5PYManager:
    def __init__(self, path, method='a'):
        self.hf = h5py.File(path, method)

    def __enter__(self):
        # when you call `with H5PYManager(foo) as bar`, the return of this method will be assigned to `bar`
        return self.create_datasets()

    def __exit__(self, type, value, traceback):
        # this method gets called when you exit the `with` clause, including when an error is raised
        self.hf.close()    

    def create_datasets(self):
        grp = self.hf.create_group('left')
        return [grp.create_dataset('voltage', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3)),
                grp.create_dataset('current', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3))]

if __name__ == '__main__':
    with H5PYManager(path) as dset:
        for i in range(3):
            if i == 0:
                dset[0][:] = np.random.random(dset[0].shape) 
                dset[1][:] = np.random.random(dset[1].shape)
            else:
                dset[0].resize(dset[0].shape[0]+10**4, axis=0)
                dset[0][-10**4:] = np.random.random((10**4,3))
                dset[1].resize(dset[1].shape[0]+10**4, axis=0)
                dset[1][-10**4:] = np.random.random((10**4,3))

Answer 2

@tel为该问题提供了一种优雅的解决方案。我在他的回答下方的评论中概述了一种更简单的方法。对于初学者来说，编码（和理解）更简单。基本上，它对@Maxtron的原始代码进行了一些小的更改。修改为：

将with h5py.File(path, "a") as hf:移至__main__例行程序
通过hf中的create_h5py(hf)
我还在os.remove()之前添加了一个测试，以避免h5文件出错不存在

我建议的以下修改内容：

import h5py, os
import numpy as np

path = './out.h5'
# test existence of H5 file before deleting
if  os.path.isfile(path):
    os.remove(path)

def create_h5py(hf):
    grp = hf.create_group('left')
    dset = []
    dset.append(grp.create_dataset('voltage', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3)))
    dset.append(grp.create_dataset('current', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3)))
    return dset

if __name__ == '__main__':

    with h5py.File(path, "a") as hf:
        dset = create_h5py(hf)
        for i in range(3):

            if i == 0:
                dset[0][:] = np.random.random(dset[0].shape) 
                dset[1][:] = np.random.random(dset[1].shape)
            else:
                dset[0].resize(dset[0].shape[0]+10**4, axis=0)
                dset[0][-10**4:] = np.random.random((10**4,3))
                dset[1].resize(dset[1].shape[0]+10**4, axis=0)
                dset[1][-10**4:] = np.random.random((10**4,3))

使用python中的h5py调整和保存.h5格式的数据集的大小

2 个答案:

问题

解决方案