H5py Unicode支持字典

时间:2018-02-01 22:04:55

标签: python unicode h5py

因此,使用h5py时,属性中不支持unicode。它抛出错误:TypeError:没有dtype的转换路径:dtype('

我已经看过推荐的解决方法,即在存储之前对字符串进行编码,如下所示:( f.attrs ['x'] = [a.encode('utf8')for a mylist] )

但是,在将它们存储在字典中时,我不明白如何将它们编码成字符串。

我的代码如下:

使用sess.as_default():

        if vae_checkpoint:
            print('Restoring VAE checkpoint: %s' % vae_checkpoint)
            saver.restore(sess, vae_checkpoint)

        nrof_images = len(image_list)
        nrof_batches = int(math.ceil(len(image_list) / args.batch_size))
        latent_vars = np.zeros((nrof_images, args.latent_var_size))
        attributes = np.zeros((nrof_images, nrof_attributes))
        for i in range(nrof_batches):
            start_time = time.time()
            latent_var_, attribs_, indices_ = sess.run([latent_var, attribs, indices])
            latent_vars[indices_,:] = latent_var_
            attributes[indices_,:] = attribs_
            duration = time.time() - start_time
            print('Batch %d/%d: %.3f seconds' % (i+1, nrof_batches, duration))
        # NOTE: This will print the 'Out of range' warning if the last batch is not full,
        #  as described by https://github.com/tensorflow/tensorflow/issues/8330

        # Calculate average change in the latent variable when each attribute changes
        attribute_vectors = np.zeros((nrof_attributes, args.latent_var_size), np.float32)
        for i in range(nrof_attributes):
            pos_idx = np.argwhere(attributes[:,i]==1)[:,0]
            neg_idx = np.argwhere(attributes[:,i]==-1)[:,0]
            pos_avg = np.mean(latent_vars[pos_idx,:], 0)
            neg_avg = np.mean(latent_vars[neg_idx,:], 0)
            attribute_vectors[i,:] = pos_avg - neg_avg

> filename = os.path.expanduser(args.output_filename)
>             print('Writing attribute vectors, latent variables and attributes to %s' % filename)
>             mdict = {'latent_vars':latent_vars, 'attributes':attributes, 
>                      'fields':fields, 'attribute_vectors':attribute_vectors }
>               with h5py.File(filename, 'w') as f:
>                 for key, value in iteritems(mdict):
>                     f.create_dataset(key, data=value)

如何在使用上述字典时对字符串进行编码? Python 3.3

谢谢, 道格

1 个答案:

答案 0 :(得分:1)

根据docs,每个字符串值必须编码为字节字符串。对于字符串值,这意味着value.encode('utf-8'),但对于字符串的容器(主要是列表和元组),这意味着遍历每个值并对其进行编码。

在这种情况下,值fields是(unicode)字符串的列表。因此,解决方案是将其转换为字节串列表:

# ...
fields = [v.encode('utf-8') for v in fields]
mdict = {'latent_vars':latent_vars, 'attributes':attributes, 'fields':fields, 'attribute_vectors':attribute_vectors}
# ...

其余的变量是整数的numpy数组,所以它们没问题。