如何在Python中循环移除HDF5组,根据掩码删除行?

时间:2018-04-25 16:59:16

标签: python traversal hdf5 h5py

我有一个包含许多不同组的HDF5文件,所有这些组都有相同的行数。我还有一个布尔掩码,用于保留或删除行。我想迭代HDF5文件中的所有组,根据掩码删除行。

递归访问所有群组的recommended methodvisit(callable),但我无法确定如何将我的面具传递给可调用者。

以下是一些代码,希望能够证明我想做什么,但哪些不起作用:

def apply_mask(name, *args):
    h5obj[name] = h5obj[name][mask]

with h5py.File(os.path.join(directory, filename), 'r+') as h5obj:
    h5obj.visit(apply_mask, mask)

导致错误

TypeError: visit() takes 2 positional arguments but 3 were given

如何将我的蒙版数组放入此函数?

1 个答案:

答案 0 :(得分:1)

我最终通过一系列hacky解决方法实现了这一目标。如果有更好的解决方案,我有兴趣了解它!

with h5py.File(os.path.join(directory, filename), 'r+') as h5obj:
    # Use the visit callable to append to a list of key names
    h5_keys = []
    h5obj.visit(h5_keys.append)
    # Then loop over those keys and, if they're datasets rather than
    # groups, remove the invalid rows
    for h5_key in h5_keys:
        if isinstance(h5obj[h5_key], h5py.Dataset):
            tmp = np.array(h5obj[h5_key])[mask]
            # There is no way to simply change the dataset because its
            # shape is fixed, causing a broadcast error, so it is
            # necessary to delete and then recreate it.
            del h5obj[h5_key]
            h5obj.create_dataset(h5_key, data=tmp)