我想要几个过程,每个过程一次加载一个不同的图像并进行推理(例如VGG16)。
我正在使用带有tensorFlow后端的Keras,一个GPU(GTX 1070)。以下是代码:
import tensorflow as tf
import multiprocessing
from multiprocessing import Pool, Process, Queue
import os
from os.path import isfile, join
from PIL import Image
import time
from keras.applications.vgg16 import VGG16
import numpy as np
from keras.backend.tensorflow_backend import set_session
test_path = 'test path to images ...'
output = Queue()
def worker(file_names, output):
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.25
config.gpu_options.visible_device_list = "0"
set_session(tf.Session(config=config))
inference_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3), pooling='avg')
model_image_size = (224,224)
times = []
for file_name in file_names:
image = Image.open(os.path.join(test_path, file_name))
im_width = image.size[0]
im_height = image.size[1]
m = (im_width - im_height) // 2
image = image.crop((m, 0, im_width - m, im_height))
image = image.resize((model_image_size), Image.BICUBIC)
image = np.array(image, dtype='float32')
image /= 255.
image = np.expand_dims(image, 0) # Add batch dimension.
start = time.time()
res = inference_model.predict(image)
end = time.time()
elapsed_time = end - start
print("elapsed time", elapsed_time)
times.append(elapsed_time)
average_time = np.mean(times[2:])
print("average time ", average_time)
if __name__ == '__main__':
file_names = [f for f in os.listdir(test_path) if isfile(join(test_path, f))]
file_names.sort()
num_workers = 3
processes = [Process(target=worker, args=(file_names[x::num_workers], output)) for x in range(num_workers)]
for p in processes:
p.start()
for p in processes:
p.join()
我注意到,与单进程相比,多进程每个图像的推理经过时间更慢。例如,对于单张图像,推理经过时间为0.012秒。当运行3个进程时,我希望得到相同的结果,但是,每个图像的平均推断时间几乎为0.02秒。可能是什么原因呢? (也许是CUDA上下文–切换?)有没有办法解决这个问题?