更新

Question

我有一个功能，必须循环遍历图像的各个像素并计算一些几何形状。此功能需要很长时间才能运行（在24兆像素的图像上需要约5个小时），但似乎应该很容易在多个内核上并行运行。但是，我终生无法找到使用Multiprocessing软件包进行此类操作的详细记录的，经过充分说明的示例。这是我现在作为玩具示例运行的代码：

import numpy as np
import matplotlib.pyplot as plt
from scipy import misc
from skimage import color
import multiprocessing 
from multiprocessing import Process

#Some dumb stand in function for this exercise
def dumb_func(image):
    ny, nx = image.shape
    temp = np.empty_like(image)

    for y in range(ny):
        for x in range(nx):
            temp[y, x] = np.square(image[y, x])

    return temp

#Convert image to greyscale
img = color.rgb2gray(misc.ascent())

#Resize the image
ns = 2048 #Pixel size
img = misc.imresize(img, size = (ns, ns))


#Split the image into equal chunks...not sure how this works for arrays that
#are weird shapes and aren't the same size in each dimension

divs = 4
init_split = np.array_split(img, divs, axis = 0)
side = init_split[0].shape[0]
chunked = np.empty((divs, divs, side, side))
cur = 0
for i in range(divs):
    split = np.array_split(init_split[i], divs, axis = 1)
    for j in range(divs):
        chunked[i, j, :, :] = split[j]
        cur +=1

#Pull core count and divide by two to be safe
cores = int(multiprocessing.cpu_count() / 2)

result = np.empty_like(chunked)
idxs = np.array(np.meshgrid(np.arange(0, divs, 1), 
                            np.arange(0, divs, 1))).T.reshape(-1, 2)

基本上，此代码加载到图像中，将其转换为灰度，使其变大，然后分块。分块数组的形状为（i，j，ny，nx），其中i和j是标识正在处理的图像块的索引，并且ny，nx描述每个块的像素大小。

此外，我正在创建一个名为idxs的数组，该数组将所有可能的索引存储到分块数组中以拉出分块图像。

我想做的是在块上并行运行一个函数（在本例中为dumb_func），并将结果存储在相同形状的结果数组中。我想象的方法是循环遍历idxs数组并为进程分配属于这些索引的组块，直到核心数，等待这些核心完成，然后将更多的进程提供给核心直到完成。我之所以陷入困境，是因为我无法A）弄清楚如何访问函数中的返回值，以及B）如何处理可能有16个块和5个内核导致仅需要一个进程的最后一次迭代的情况。

我该怎么做？在过去的6到7个小时里，我一直在阅读有关“多处理池”，“流程”，“地图”，“星图”等的信息，但我一生都无法理解如何实现这一目标。

编辑Reedinationer：

这是我更新的代码，可以正常运行。但是，new_data数组永远不会更新。我用100填充了它，在例程的末尾new_data正是它的初始化方式。

import numpy as np
import matplotlib.pyplot as plt
from scipy import misc
from multiprocessing import Process, JoinableQueue
from time import time

#SOme dumb stand in function for this exercise
def dumb_func(q, new_data):
    while True:
        index, image = q.get()
        temp = image **2

        new_data[index[0], index[1], :, :] = temp
        q.task_done()

if __name__ == "__main__":
    start = time()
    q = JoinableQueue()
    img = misc.ascent()
    #Resize the image
    ns = 2048 #Pixel size
    img = misc.imresize(img, size = (ns, ns))
    #Split the image into equal chunks...not sure how this works for arrays that
    #are weird shapes and aren't the same size in each dimension

    divs = 4
    init_split = np.array_split(img, divs, axis = 0)
    side = init_split[0].shape[0]
    chunked = np.empty((divs, divs, side, side))
    cur = 0
    for i in range(divs):
        split = np.array_split(init_split[i], divs, axis = 1)
        for j in range(divs):
            chunked[i, j, :, :] = split[j]
            cur +=1

    new_data = np.full(chunked.shape, 100)
    idxs = np.array(np.meshgrid(np.arange(0, divs, 1), 
                                np.arange(0, divs, 1))).T.reshape(-1, 2)

    for i in range(len(idxs)):
        q.put((idxs[i], chunked[idxs[i][0], idxs[i][1], :, :]))

    print ('starting workers')

    worker_count = len(idxs)
    processes = []
    for i in range(worker_count):
        p = Process(target=dumb_func, args=[q, new_data])
        p.daemon = True
        p.start()
    print('main thread waiting')
    q.join()

    end = time()
    print('{:.3f} seconds elapsed'.format(end - start))

Answer 1

我将从依赖关系开始做这样的事情：

from multiprocessing import Pool
import numpy as np
from PIL import Image

# and some for testing
from random import random
from time import sleep

首先，我定义了一个将图像分成“块”的函数，就像您所说的那样：

def chunkit(ys, xs, blocksize=64):
    for y in range(0, ys, blocksize):
        yt = (y, min(ys, y + blocksize))
        for x in range(0, xs, blocksize):
            xt = (x, min(xs, x + blocksize))
            yield yt, xt

这是一个惰性迭代器，因此可以持续一段时间。

然后定义我的worker函数：

def dumb_func(cc):
    (y0,y1), (x0,x1) = cc
    # convert to floats for ease of processing
    chunk = image[y0:y1,x0:x1] / 255.
    # random slow down for testing
    # sleep(random() ** 6)
    res = chunk ** 2
    # convert back to bytes for efficiency
    return cc, (res * 255).astype(np.uint8)

我确保源数组尽可能保持原始格式的效率，并以相同格式发送回去（如果您显然要处理其他像素格式，这可能会有些麻烦）。

然后我将其放在一起：

if __name__ == '__main__':
    source = Image.open('tmp.jpeg')
    image = np.asarray(source)
    print("loaded", image.shape, image.dtype)

    with Pool() as pool:
        resit = pool.imap_unordered(
            dumb_func, chunkit(*image.shape[:2]))

        output = np.empty_like(image)
        for cc, res in resit:
            (y0,y1), (x0,x1) = cc
            output[y0:y1,x0:x1] = res

    im = Image.fromarray(output, 'RGB')
    im.save('out.jpeg')

这会在几秒钟内搅动15M像素的图像，其中大部分用于加载/保存图像。数组步长和缓存友好性可能要聪明得多，但希望能有所帮助！

注意：我认为这段代码依赖于CPython Unix风格的流程派生语义，以确保在流程之间有效地共享映像。不知道如果在其他地方使用它会发生什么情况

Answer 2

基本上，我一直在为相同的东西编写代码。现在的目标是用透明像素替换白色像素，但是似乎替换了整个图像，所以在某处存在一个错误……尽管multiprocessing模块中也不再出现错误，所以也许它可以作为如何加载Queue然后使您的工作进程工作的示例！

from PIL import Image
from multiprocessing import Process, JoinableQueue
from threading import Thread
from time import time

def worker_function(q, new_data):
    while True:
        # print("Items in queue: {}".format(q.qsize()))
        index, pixel = q.get()
        if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
            out_pixel = (0, 0, 0, 0)
        else:
            out_pixel = pixel
        new_data[index] = out_pixel
        q.task_done()

if __name__ == "__main__":
    start = time()
    q = JoinableQueue()

    my_image = Image.open('InputImage.jpg')
    my_image = my_image.convert('RGBA')
    datas = list(my_image.getdata())
    new_data = [0] * len(datas) # make a blank array the size of our image to fill later

    print('putting image into queue')
    for count, item in enumerate(datas):
        q.put((count, item))

    print('starting workers')
    worker_count = 50
    processes = []
    for i in range(worker_count):
        p = Process(target=worker_function, args=[q, new_data])
        p.daemon = True
        p.start()
    print('main thread waiting')
    q.join()
    my_image.putdata(new_data)
    my_image.save('output.png', "PNG")

    end = time()
    print('{:.3f} seconds elapsed'.format(end - start))

我认为在if __name__ == "__main__"块内“保护”您的代码很重要，否则生成的进程似乎可以运行它。

更新

您似乎需要实现Manager()（或者可能还有其他我不知道的方法！）。通过将代码更改为以下代码，可以运行我的代码：

from PIL import Image
from multiprocessing import Process, JoinableQueue, Manager
from threading import Thread
from time import time


def worker_function(q, new_data):
    while True:
        # print("Items in queue: {}".format(q.qsize()))
        index, pixel = q.get()
        if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
            out_pixel = (0, 0, 0, 0)
        else:
            out_pixel = pixel
        new_data[index] = out_pixel
        q.task_done()


if __name__ == "__main__":
    start = time()
    q = JoinableQueue()
    my_image = Image.open('InputImage.jpg')
    my_image = my_image.convert('RGBA')
    datas = list(my_image.getdata())
    # new_data = [(0, 0, 0, 0)]*len(datas)
    manager = Manager()
    new_data = manager.list([(0, 0, 0, 0)]*len(datas))
    print(new_data)
    print('putting image into queue')
    for count, item in enumerate(datas):
        q.put((count, item))

    print('starting workers')
    worker_count = 50
    processes = []
    for i in range(worker_count):
        p = Process(target=worker_function, args=[q, new_data])
        p.daemon = True
        p.start()
    print('main thread waiting')
    q.join()
    print("Saving Image")
    my_image.putdata(new_data)
    my_image.save('output.png', "PNG")

    end = time()
    print('{:.3f} seconds elapsed'.format(end - start))

尽管这似乎不是最快的选择！我相信还有其他提高速度的方法。我对Thread执行相同操作的代码看起来非常相似：

from PIL import Image
from threading import Thread
from queue import Queue
import time

start = time.time()
q = Queue()

planeIm = Image.open('InputImage.jpg')
planeIm = planeIm.convert('RGBA')
datas = planeIm.getdata()
new_data = [0] * len(datas)

print('putting image into queue')
for count, item in enumerate(datas):
    q.put((count, item))

def worker_function():
    while True:
        # print("Items in queue: {}".format(q.qsize()))
        index, pixel = q.get()
        if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
            out_pixel = (0, 0, 0, 0)
        else:
            out_pixel = pixel
        new_data[index] = out_pixel
        q.task_done()

print('starting workers')
worker_count = 100
for i in range(worker_count):
    t = Thread(target=worker_function)
    t.daemon = True
    t.start()
print('main thread waiting')
q.join()
print('Queue has been joined')
planeIm.putdata(new_data)
planeIm.save('output.png', "PNG")

end = time.time()

elapsed = end - start
print('{:3.3} seconds elapsed'.format(elapsed))

但是，使用线程处理我的图像大约需要23秒，而使用多重处理大约需要170秒！我怀疑这可能是由于启动Process对象所需的较大开销，以及我目前处理每个像素的算法还很简单（仅if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:位），因此我很可能不会产生了速度的提高，而复杂的像素处理算法将使我受益。另请注意multiprocessing documentation

单个管理器可以由网络上不同计算机上的进程共享。但是，它们比使用共享内存要慢。

这使我相信，有些替代方法更快。

对图像块进行多处理

2 个答案:

更新