Question

我很难找到这种转变的资源。我看到一些示例代码中的输入数据是.pkl形式，而mnist数据集是.idx3-ubyte。用于计算视觉的数据集格式各不相同。我不熟悉任何格式，如果可以解决这个问题，我将不胜感激。谢谢。

更新：现在我使用下面的代码以.tfrecords格式成功加载我的图像但是因为这样的格式对于cnn来说似乎不可读，我仍然试图修改.pkl格式的代码。但是，我的跑步都失败了。

            cwd='/Users/Downloads/tflearn_train/'
            classes={'0','1'} #classify into 2 types
            writer= tf.python_io.TFRecordWriter("train.tfrecords") #file to be produced

            for index,name in enumerate(classes):
                class_path=cwd+name+'/'
                for img_name in os.listdir(class_path):
                    if (not img_name.startswith('.') and img_name != 'Thumbs.db'):
                        img_path=class_path+img_name #the path of every pic
                        img=Image.open(img_path,"r")
                        img= img.resize((224,224))
                        img_raw=img.tobytes()#transform pic into binary
                        example = tf.train.Example(features=tf.train.Features(feature={
                            "label": tf.train.Feature(int64_list=tf.train.Int64List(value=[index])),
                            'img_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw]))
                        }))
                        writer.write(example.SerializeToString())
            writer.close()

以上工作正常。但我试着把

        write_file = open('train.pkl', 'wb')
        cPickle.dump(example, write_file, -1)
        cPickle.dump(example.features.feature['label'].int64_list.value, write_file, -1)
        write_file.close()

循环内外。到目前为止，在使用cPickle.load时，我无法创建一个看起来像其他.pkl文件的.pkl文件

感谢您的每一个输入。

Answer 1

Pickle存储有关python对象结构的信息以及数据。对于简单的张量，这可能不是必需的。

相反，通常的方法是将二进制格式的矩阵数据转储到文件中，然后将其直接重新加载到内存中。我相信用于MNIST图数据集的“.idx3-ubyte”就是这样一个例子。

如果使用python和numpy，最好使用numpy的.npy格式，这样可以简化np.load和np.dump函数的过程：https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.load.html。

如果您需要加载二进制数据转储，请查看https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html

如何快速将jpg数据集转换为CNP的.pkl？

1 个答案: