下面的代码将jpeg图像加载到numpy ndarrays数组中。目前它工作正常,但我觉得必须有更多的pythonic方式来做到这一点。
import scipy.ndimage as spimg
import numpy as np
# Read images into scipy and flatten to greyscale
# Using generator function instead of list comprehension
# for memory efficiency
human_files_convert = (spimg.imread(path, flatten=True) for path in human_files[:2099])
使用上面的生成器函数以便单独处理每个图像,此处的列表理解失败。
batch_size = 1000
step = 0
human_files_ndarray = np.empty((1, 250, 250))
# Create empty list, to append empty image arrays
human_files_list = []
batch = 1
total_processed = 0
# iterate through image arrays
for path in human_files_convert:
# Append to list
human_files_list.append(path)
# Stack list of arrays
step += 1
total_processed += 1
if (step % batch_size == 0) or (len(human_files[:2099]) == total_processed):
new_stack = np.stack(human_files_list)
print("Batch: ", batch)
print(new_stack.shape)
step = 0
human_files_ndarray = np.concatenate((human_files_ndarray, new_stack))
print(human_files_ndarray.shape)
print(total_processed)
# Create empty list, to append empty image arrays
human_files_list = []
batch += 1
有关如何使此代码更高效或pythonic的任何想法?
答案 0 :(得分:0)
使用上面@sascha的建议,我将生成器函数的输出发送到hdf5文件。这样做需要来自>的内存使用。设置为4GB,小于200MB。额外的好处是我现在有一个已加载数据集的磁盘副本,就像一个pickle文件。
# Confirm correct import of images
import scipy.ndimage as spimg
import numpy as np
import h5py
import tqdm
np.set_printoptions(threshold=1000)
# Use h5py to store large uncompressed image arrays
img = h5py.File("images.hdf5", "w")
human_dset = img.create_dataset("human_images", (len(human_files), 250, 250))
# Read images into scipy and flatten to greyscale
# Using generator function instead of list comprehension
# for memory efficiency
slice = len(human_files)
human_files_convert = (spimg.imread(path, flatten=True) for path in human_files[:slice])
i = 0
for r in tqdm.tqdm(human_files_convert, total=slice):
# Rescale [0,255] --> [0,1]
r = r.astype('float32')/255
# Insert Row into dset
human_dset[i] = r
i += 1
img.close()