我想知道为什么我的CPU负载如此之低,即使我没有获得高处理率:
import time
from multiprocessing import Pool
import numpy as np
from skimage.transform import AffineTransform, SimilarityTransform, warp
center_shift = 256 / 2
tf_center = SimilarityTransform(translation=-center_shift)
tf_uncenter = SimilarityTransform(translation=center_shift)
def sample_gen_random_i():
for i in range(10000000000000):
x = np.random.rand(256, 256, 4)
y = [0]
yield x, y
def augment(sample):
x, y = sample
rotation = 2 * np.pi * np.random.random_sample()
translation = 5 * np.random.random_sample(), 5 * np.random.random_sample()
scale_factor = np.random.random_sample() * 0.2 + 0.9
scale = scale_factor, scale_factor
tf_augment = AffineTransform(scale=scale, rotation=rotation, translation=translation)
tf = tf_center + tf_augment + tf_uncenter
warped_x = warp(x, tf)
return warped_x, y
def augment_parallel_sample_gen(samples):
p = Pool(4)
for sample in p.imap_unordered(augment, samples, chunksize=10):
yield sample
p.close()
p.join()
def augment_sample_gen(samples):
for sample in samples:
yield augment(sample)
# This is slow and the single cpu core has 100% load
print('Single Thread --> Slow')
samples = sample_gen_random_i()
augmented = augment_sample_gen(samples)
start = time.time()
for i, sample in enumerate(augmented):
print(str(i) + '|' + str(i / (time.time() - start))[:6] + ' samples / second', end='\r')
if i >= 2000:
print(str(i) + '|' + str(i / (time.time() - start))[:6] + ' samples / second')
break
# This is slow and there is only light load on the cpu cores
print('Multithreaded --> Slow')
samples = sample_gen_random_i()
augmented = augment_parallel_sample_gen(samples)
start = time.time()
for i, sample in enumerate(augmented):
print(str(i) + '|' + str(i / (time.time() - start))[:6] + ' samples / second', end='\r')
if i >= 2000:
print(str(i) + '|' + str(i / (time.time() - start))[:6] + ' samples / second')
break
我正在使用multiprocessing.Pool的imap,但我认为有一些开销。当没有使用扩充和没有多处理时,我可以达到大约500个样本/秒,没有多处理就增加了150个样本,并且像扩充和多处理一样170,所以我怀疑我的方法一定有问题。 代码应该是可执行的并且不言自明! :)
答案 0 :(得分:0)
问题似乎是
return warped_x, y
将图像传递给已处理并将整个转换后的图像传回主进程似乎是瓶颈。如果我只回馈第一个像素
return x[0, 0, 0], y
并将样本创建移动到子进程
def augment(y):
x = np.random.rand(256, 256, 4)
rotation = 2 * np.pi * np.random.random_sample()
...
速度将随着核心数量的增加而几乎呈线性增长......
也许线程比流程(?)
更好