Question

我有一个代码，可以读取数十万张图像并将它们存储到一个数组中。一切工作都很好，并且代码比没有多重处理要快得多。我注意到，当我添加一行代码时，代码只是挂起，而如果我不在多处理模式下，那一行就可以正常运行。破坏代码的行（甚至不执行单个循环）如下：

cv2.addWeighted(img,4, cv2.GaussianBlur(img , (0,0) , 30) ,-4 ,128) #line that breaks it

我只是找不到任何原因可以在单个进程上完全完美地工作，而不能在多进程上完全正常地工作。提醒一下，我并不是说代码运行缓慢，它根本没有运行，只是坐在那里什么也没做。当达到这一部分时，我的CPU使用率下降到5％左右。只需注释该行即可使代码完美运行。

PS：以防万一有人注意到，我知道将DataFrame分成相等部分的代码并不完美，但这根本不会影响问题。

def get_images(inp):
    if inp[0] == 'test':
        x = X_test
        path = test_path
    elif inp[0] == 'train':
        x = X_train
        path = train_path
    frac=inp[1]-1
    start=len(x)//n_cpu*frac
    count = 0
    if frac==0:
        x = pd.DataFrame(np.array(x)[:len(x)//n_cpu*(frac+1)+1])
    elif frac<n_cpu:
        x = pd.DataFrame(np.array(x)[start+1:len(x)//n_cpu*(frac+1)+1])
    elif frac == n_cpu:
        x = pd.DataFrame(np.array(x)[start+1:])
    x_ret = np.empty((len(x), 50, 50, 3), dtype=np.uint8)
    for i in range(len(x)):
        count+=1
        test = x.iloc[i,:]
        if test[1] == 0:
            full_path = os.path.join(path,'no_idc',test[0])
        elif test[1] == 1:
            full_path = os.path.join(path,'has_idc',test[0])
        img = cv2.imread(full_path,cv2.IMREAD_COLOR)
        if img.shape != (50,50):
            img = cv2.resize(img,(50,50))
        img = cv2.addWeighted(img,4, cv2.GaussianBlur(img , (0,0) , 30) ,-4 ,128) #line that breaks
        x_ret[i, ...] = img
    return x_ret

n_cpu = multiprocessing.cpu_count()

p = Pool(n_cpu)

quer=(['test',fraction] for fraction in range(1,n_cpu+1))

X_test = np.concatenate([i for i in p.map(get_images, quer)])
# X_test = X_test.reshape(*X_test.shape,1)
p.close()
p.join()
print("Loaded Test set")

cv2.addWeighted挂在多处理程序上

0 个答案: