使用类型“ 8位整数(80)的数组”将字符串保存到h5py数据集中

时间:2019-03-21 16:40:26

标签: h5py

我希望使用数据类型“ 8位整数(80)数组”创建一个h5py“字符串”数据集(例如“ A”)(如HDFView中所示,请参见{{3 }})。实际上,此长度为80的数组的每个整数都是此字符串的相应字符的ord(x)。例如,Top被存储为84 111 112 0 0 0 ...,总共存储了80 int8

所需的数据集应如下所示

DATASET "NOM" {
                     DATATYPE  H5T_ARRAY { [80] H5T_STD_I8LE }
                     DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
                     DATA {
                     (0): [ 84, 111, 112, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
                     }

但是,我无法使用h5py创建此数据集。使用标准的numpy数组可以做到这一点

DATASET "NOM" {
                     DATATYPE  H5T_STD_I8LE
                     DATASPACE  SIMPLE { ( 1, 80 ) / ( 1, 80 ) }
                     DATA {
                     (0,0): 84, 111, 112, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                     (0,15): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                     (0,31): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                     (0,47): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                     (0,63): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                     (0,79): 0
                     }
                  }

如果我的字符串是“ Top”,那么datadtype是什么。

.create_dataset("NOM", data=data, dtype=dtype)

根据here,也许我需要使用较低级别的界面...?

谢谢!

解决方案

问题在于,如果我们在使用data之前创建numpy数据集.create_dataset("NOM", data=data),则内部numpy总是将我的80int8数据类型解释为{{1 }}

int8

因此,解决方案是先用所需的dtype = np.dtype("80int8") x = np.array(2, dtype=dtype) # x.dtype = dtype('int8') 声明数据集,然后填写数据。

dtype

1 个答案:

答案 0 :(得分:0)

制作大小和内容均正确的uint8数组:

In [417]: x = np.zeros(80, dtype='uint8')                                                 
In [419]: x[:3]=[ord(i) for i in 'Top']                                                                                                                                
In [421]: ds1=hf.create_dataset('other4', data=x) 

结构化数组方法:

In [486]: dt = np.dtype([('f0','80int8')])                                                
In [487]: dt                                                                              
Out[487]: dtype([('f0', 'i1', (80,))])
In [488]: x = np.zeros(1, dt)                                                             
In [489]: x['f0'][0][:3]=[ord(i) for i in 'Top']                                          
In [490]: x                                                                               
Out[490]: 
array([([ 84, 111, 112,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],)],
      dtype=[('f0', 'i1', (80,))])
In [491]: ds1=hf.create_dataset('st1', data=x)                                            
In [492]: ds1                                                                             
Out[492]: <HDF5 dataset "st1": shape (1,), type "|V80">

产生

   DATASET "st1" {
      DATATYPE  H5T_COMPOUND {
         H5T_ARRAY { [80] H5T_STD_I8LE } "f0";
      }
      DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
      DATA {
      (0): {
            [ 84, 111, 112, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
         }
      }
   }