Question

我有一个火车资料夹。该文件夹中有2000张不同尺寸的图像。我也有labels.csv文件。训练网络时，加载和调整图像大小非常耗时。因此，我阅读了一些有关h5py的论文，这是解决这种情况的方法。我尝试了以下代码：

PATH = os.path.abspath(os.path.join('Data'))
SOURCE_IMAGES = os.path.join(PATH, "Train")
print "[INFO] images paths reading"
images = glob(os.path.join(SOURCE_IMAGES, "*.jpg"))
images.sort()
print "[INFO] image labels reading"
labels = pd.read_csv('Data/labels.csv')

train_labels=[]

for i in range(len(labels["car"])):

    if(labels["car"][i]==1.0):

        train_labels.append(1.0)
    else:

        train_labels.append(0.0)

data_order = 'tf' 

if data_order == 'th':
    train_shape = (len(images), 3, 224, 224)
else:
    train_shape = (len(images), 224, 224, 3
print "[INFO] h5py file created"

hf=h5py.File('data.hdf5', 'w')

hf.create_dataset("train_img",
                  shape=train_shape,
                  maxshape=train_shape,
                  compression="gzip",
                  compression_opts=9)

hf.create_dataset("train_labels",
            shape=(len(train_labels),),
            maxshape=(None,),
            compression="gzip",
            compression_opts=9)

hf["train_labels"][...] = train_labels


print "[INFO] read and size images"
for i,addr in enumerate(images):

    s=dt.datetime.now()
    img = cv2.imread(images[i])
    img = cv2.resize(img, (224, 224), interpolation=cv2.INTER_CUBIC)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    hf["train_img"][i, ...] = img[None]
    e=dt.datetime.now()
    print "[INFO] image",str(i),"is saved time:", e-s, "second"

hf.close()

但是当我运行这段代码时。代码运行时间。起初它非常快，但后来阅读却很慢，特别是在这行hf [“ train_img”] [i，...] = img [None]。此程序的输出。如您所见，时间在不断增加。我在哪里做错了？感谢您的建议。

Answer 1

train_img用compression_opts=9创建。这是最高的压缩级别，需要进行最多的压缩/解压缩工作。

如果压缩图像的时间是一个瓶颈，并且您可以在所占用的空间上进行权衡，请使用较低的压缩级别，例如默认（=4）。甚至完全禁用压缩。

如何将图像另存为h5py文件？

1 个答案: