我正在做一些图像处理,但我有很多图像(~10,000)。因此,我想并行执行,但由于某种原因,它并没有像它应该的那样快。我正在使用MacBook Pro 16Gb和i7。代码是这样的:
def process_image(img_name):
cv2.imread('image/'+img_name)
tfs_im = some_function(im) # use opencv, skimage and math
cv2.imwrite("new_img/"img_name,tfs_im)
if __name__ == '__main__':
### Set Working Dir
wd_path = os.path.dirname(os.path.realpath(__file__))
os.chdir(wd_path+'/..')
img_list = os.listdir('images')
pool = Pool(processes=8)
pool.map(process_image, img_list) # proces data_inputs iterable with pool
我还尝试了一种使用排队的更基本的方法。
def process_image(img_names):
for img_name in img_names:
cv2.imread('image/'+img_name)
im = read_img(img_name)
tfs_im = some_function(im) # use opencv, skimage and math
cv2.imwrite('new_img/'+img_name,tfs_im)
if __name__ == '__main__':
### Set Working Dir
wd_path = os.path.dirname(os.path.realpath(__file__))
os.chdir(wd_path+'/..')
q = Queue()
img_list = os.listdir('image')
# split work into 8 processes
processes = 8
def splitlist(inlist, chunksize):
return [inlist[x:x+chunksize] for x in xrange(0, len(inlist), chunksize)]
list_splitted = splitlist(img_list, len(img_list)/processes+1)
for imgs in list_splitted:
p = Process(target=process_image, args=([imgs]))
p.Daemon = True
p.start()
这些都没有达到预期的速度。我知道每个进程都需要一些设置时间,因此代码运行速度不会快8倍,但到目前为止它的运行时间比单线程快2倍。
也许某些任务不是并行化的,例如在不同进程中从/向同一文件夹写入/读取图像?
感谢您的任何提示或建议!