我是HDF5的新手,我正在尝试使用三列创建复合类型的数据集:MD5,大小,另一个数据集。
我怎样才能做到这一点?
我尝试了以下代码:
import h5py
import numpy as np
dbfile = h5py.File("test.h5",'w')
dtype1 = h5py.Dataset('myset', (100,))
dtype2 = np.dtype([
('MD5', np.str_, 32),
('size', "i8"),
('timestep0', dtype1)
])
records = dbfile.create_dateset('records', (4,), rec_type)
我收到错误:
typeError: __init__() takes exactly 2 arguments (3 given)
指的是:
dtype1 = h5py.Dataset('myset', (100,))
答案 0 :(得分:0)
h5py.Dataset('myset', (100,))
尝试直接创建dataset
对象(调用它__init__
?)。但根据参考文献:
http://docs.h5py.org/en/latest/high/dataset.html#reference
class Dataset(identifier)
Dataset objects are typically created via Group.create_dataset(), or by
retrieving existing datasets from a file. Call this constructor to
create a new Dataset bound to an existing DatasetID identifier.
即使你能得到这样一个对象(我仍然不理解),它也不会在np.dtype
中工作。例如,如果我将其替换为datetime.datetime
对象,则结果为dtype='O'
In [503]: dtype2 = np.dtype([
...: ('MD5', np.str_, 32),
...: ('size', "i8"),
...: ('timestep0', datetime.datetime)
...: ])
In [504]: dtype2
Out[504]: dtype([('MD5', '<U32'), ('size', '<i8'), ('timestep0', 'O')])
在numpy
dytes中有定义的类型,如字符串,整数和浮点数,以及object
(不是列表,字典或其他Python类)。
我可以将复合dtype保存到h5py
,但无法保存对象dtypes。有一个h5py
dtype被加载到numpy
对象dtype中,但它通常不会向另一个方向工作。
http://docs.h5py.org/en/latest/special.html#variable-length-strings
hdf5 can't write numpy array of object type
http://docs.h5py.org/en/latest/refs.html - 对象引用
In [7]: import h5py
In [8]: f = h5py.File('wtihref.h5','w')
In [9]: ds0 = f.create_dataset('dset0',np.arange(10))
In [10]: ds1 = f.create_dataset('dset1',np.arange(11))
In [11]: ds2 = f.create_dataset('dset2',np.arange(12))
In [12]: ds2.ref
Out[12]: <HDF5 object reference>
In [13]: ref_dtype = h5py.special_dtype(ref=h5py.Reference)
In [14]: ref_dtype
Out[14]: dtype('O')
In [16]: rds = f.create_dataset('refdset', (5,), dtype=ref_dtype)
In [17]: rds[:3]=[ds0.ref, ds1.ref, ds2.ref]
In [28]: [f[r] for r in rds[:3]]
Out[28]:
[<HDF5 dataset "dset0": shape (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), type "<f4">,
<HDF5 dataset "dset1": shape (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), type "<f4">,
<HDF5 dataset "dset2": shape (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), type "<f4">]
使用化合物dtype
In [55]: dt2 = np.dtype([('x',int),('y','S12'),('z',ref_dtype)])
In [56]: rds1 = f.create_dataset('refdtype', (5,), dtype=dt2)
In [72]: rds1[0]=(0,b'ONE',ds0.ref)
In [75]: rds1[1]=(1,b'two',ds1.ref)
In [76]: rds1[2]=(2,b'three',ds2.ref)
In [82]: rds1[:3]
Out[82]:
array([(0, b'ONE', <HDF5 object reference>),
(1, b'two', <HDF5 object reference>),
(2, b'three', <HDF5 object reference>)],
dtype=[('x', '<i4'), ('y', 'S12'), ('z', 'O')])
In [83]: f[rds1[0]['z']]
Out[83]: <HDF5 dataset "dset0": shape (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), type "<f4">
h5py
使用metadata
的{{1}}属性来存储引用信息:
dtype