大多数pythonic方式批量加载和处理图像

时间:2018-01-25 18:14:01

标签: python

下面的代码将jpeg图像加载到numpy ndarrays数组中。目前它工作正常,但我觉得必须有更多的pythonic方式来做到这一点。

import scipy.ndimage as spimg
import numpy as np


# Read images into scipy and flatten to greyscale
# Using generator function instead of list comprehension
# for memory efficiency
human_files_convert = (spimg.imread(path, flatten=True) for path in human_files[:2099])

使用上面的生成器函数以便单独处理每个图像,此处的列表理解失败。

batch_size = 1000
step = 0
human_files_ndarray = np.empty((1, 250, 250))

# Create empty list, to append empty image arrays
human_files_list = []
batch = 1
total_processed = 0

# iterate through image arrays
for path in human_files_convert:
    # Append to list
    human_files_list.append(path)
    # Stack list of arrays
    step += 1
    total_processed += 1
    if (step % batch_size == 0) or (len(human_files[:2099]) == total_processed):
        new_stack = np.stack(human_files_list)
        print("Batch: ", batch)
        print(new_stack.shape)
        step = 0
        human_files_ndarray = np.concatenate((human_files_ndarray, new_stack))
        print(human_files_ndarray.shape)
        print(total_processed)
        # Create empty list, to append empty image arrays
        human_files_list = []
        batch += 1

有关如何使此代码更高效或pythonic的任何想法?

1 个答案:

答案 0 :(得分:0)

使用上面@sascha的建议,我将生成器函数的输出发送到hdf5文件。这样做需要来自>的内存使用。设置为4GB,小于200MB。额外的好处是我现在有一个已加载数据集的磁盘副本,就像一个pickle文件。

# Confirm correct import of images
import scipy.ndimage as spimg
import numpy as np
import h5py
import tqdm

np.set_printoptions(threshold=1000)

# Use h5py to store large uncompressed image arrays
img = h5py.File("images.hdf5", "w")
human_dset = img.create_dataset("human_images", (len(human_files), 250, 250))

# Read images into scipy and flatten to greyscale
# Using generator function instead of list comprehension
# for memory efficiency
slice = len(human_files)
human_files_convert = (spimg.imread(path, flatten=True) for path in human_files[:slice])

i = 0
for r in tqdm.tqdm(human_files_convert, total=slice):
    # Rescale [0,255] --> [0,1]
    r = r.astype('float32')/255
    # Insert Row into dset
    human_dset[i] = r
    i += 1
img.close()