我项目的任务之一是加载数据集(chars74k)并为每个图像设置标签。在这个实现中,我已经有一个包含其他图像的矩阵和一个带有各自标签的列表。为了完成任务,我编写了以下代码:
#images: (input/output)matrix of images
#labels: (input/output)list of labels
#path: (input)path to my root folder of images. It is like this:
# path
# |-folder1
# |-folder2
# |-folder3
# |-...
# |-lastFolder
def loadChars74k(images, labels, path):
# list of directories
dirlist = [ item for item in os.listdir(path) if os.path.isdir(os.path.join(path, item)) ]
# for each subfolder, open all files, append to list of images x and set path as label in y
for subfolder in dirlist:
imagePath = glob.glob(path + '/' + subfolder +'/*.Bmp')
print "folder ", subfolder, " has ",len(imagePath), " images and matrix of images is:", images.shape, "labels are:", len(labels)
for i in range(len(imagePath)):
anImage = numpy.array(Image.open(imagePath[i]).convert('L'), 'f').ravel()
images = numpy.vstack((images,anImage))
labels.append(subfolder)
它工作正常,但耗时太长(大约20分钟)。我想知道是否有更快的方法来加载图像和设置标签。
问候。
答案 0 :(得分:-1)
经过一番研究,我能够以这种方式改进代码:
def loadChars74k(images, labels, path):
# list of directories
dirlist = [ item for item in os.listdir(path) if os.path.isdir(os.path.join(path, item)) ]
# for each subfolder, open all files, append to list of images x and set path as label in y
for subfolder in dirlist:
imagePath = glob.glob(path + '/' + subfolder +'/*.Bmp')
im_array = numpy.array( [numpy.array(Image.open(imagePath[i]).convert('L'), 'f').ravel() for i in range(len(imagePath))] )
images = numpy.vstack((images, im_array))
for i in range(len(imagePath)):
labels.append(subfolder)
return images, labels
我很确定它可以进一步提高,但现在还可以!它现在花了33秒!