当使用具有nvprofiler中所示的tensorflow后端的keras时,在推断之间似乎有一些空闲时间。我的终端机的所附图像显示,每次推理时keras都要花费约80ms,而对nvprofiler进行的检查表明,计算要花费62ms(包括主机到设备副本和设备到主机副本),其余时间基本上都处于空闲状态。此外,每次我运行下面的代码时,平均推断时间(在80〜150ms之间)会有差异,我不确定为什么会这样。
我的模型输入为(1,800,700,36),基本上是体素化网格中的点云。输出为2个矩阵(1,200,175,1)和(1x200,175,6)。
允许内存在GPU上动态增长
config.gpu_options.allow_growth = True
限制最大GPU使用量
config.gpu_options.per_process_gpu_memory_fraction = 0.66
设置学习阶段进行测试
keras.backend.set_learning_phase(0)
将批次大小更改为“无”并将步长更改为1。将批次大小更改为1并将步长更改为“无”会使推理变慢
model.predict(input_data, batch_size=None, verbose=1, steps=1)
Output of keras verbose in terminal that shows each step taking ~80ms
Same code, Output of keras verbose in terminal that shows each step taking ~100ms
由于NV Profiler很长,我将映像分为3部分,并最小化了在此期间未运行任何东西的进程线程。图像显示实际计算需要约60ms,GPU基本上在20ms内什么都不做。这种空闲状态会随着运行的变化而变化,有时会长达70毫秒。
Part 1, time between the green lines is not doing anything
Part 2, time between the green lines is not doing anything
Part 3, time between the green lines is not doing anything
import numpy as np
import tensorflow as tf
import keras.backend.tensorflow_backend
from keras.backend.tensorflow_backend import set_session
from keras.models import load_model
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.66
sess = tf.Session(config=config)
set_session(sess)
keras.backend.set_learning_phase(0)
losses = {
"binary_crossentropy" : "binary_crossentropy",
"huber_loss" : losses.huber_loss,
}
model = load_model('/home/louis/pixor/models/city_trainin_set/pixor_model_10_0.008.hdf5', custom_objects=losses)
obj = massage_own_label(base_dir,data)
input_data = obj.process_input_data(obj.load_pc_data(0))
for i in range(0,100):
out_class, out_labels = model.predict(input_data, batch_size=None, verbose=1, steps=1)