我正在开发一种软件,可以在家庭监控系统的多台摄像机设备上进行实时人员检测。
我目前正在运行Opencv来从IP摄像头抓取帧并使用tensorflow来分析和查找它们上的对象(代码与Tf对象检测API中的代码非常相似)。我还尝试了此链接上的tensorflow对象检测api的不同冻结推理图:
我的台式电脑配备CPU Intel Core i7-6700 CPU @ 3.40GHz×8,我的GPU是NVidia Geforce gtx960ti。
该软件按预期工作,但速度低于预期(3-5 FPS),并且对于仅适用于1个摄像头设备的单个python脚本,CPU的使用率非常高(80-90%)。
我做错了吗?什么是优化性能和实现更好的FPS和更低CPU使用率的最佳方法,以便一次分析更多视频源?到目前为止,我已经研究过多线程,但我不知道如何在我的代码上实现它。
代码段:
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
while True:
frame = cap.read()
frame_expanded = np.expand_dims(frame, axis = 0)
image_tensor = detection_graph.get_tensor_by_name("image_tensor:0")
boxes = detection_graph.get_tensor_by_name("detection_boxes:0")
scores = detection_graph.get_tensor_by_name("detection_scores:0")
classes = detection_graph.get_tensor_by_name("detection_classes:0")
num_detections=detection_graph.get_tensor_by_name("num_detections:0")
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections],
feed_dict = {image_tensor: frame_expanded})
vis_util.visualize_boxes_and_labels_on_image_array(frame,...)
cv2.imshow("video", frame)
if cv2.waitKey(25) & 0xFF == ord("q"):
cv2.destroyAllWindows()
cap.stop()
break
答案 0 :(得分:1)
我为项目尝试的一些事情可能有所帮助,
nvidia-smi -l 5
,并监控GPU使用情况和内存使用情况。 在OpenCV和TF之间创建一个小buff,因此它不会竞争相同的GPU资源,
BATCH_SIZE = 200
frameCount = 1
images = []
while (cap.isOpened() and frameCount <= 10000):
ret, image_np = cap.read()
if ret == True:
frameCount = frameCount + 1
images.append(image_np)
if frameCount % BATCH_SIZE == 0:
start = timer()
output_dict_array = run_inference_for_images(images,detection_graph)
end = timer()
avg = (end - start) / len(images)
print("TF inference took: "+str(end - start) +" for ["+str(len(images))+"] images, average["+str(avg)+"]")
print("output array has:" + str(len(output_dict_array)))
for idx in range(len(output_dict_array)):
output_dict = output_dict_array[idx]
image_np_org = images[idx]
vis_util.visualize_boxes_and_labels_on_image_array(
image_np_org,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks'),
use_normalized_coordinates=True,
line_thickness=6)
out.write(image_np_org)
##cv2.imshow('object image', image_np_org)
del output_dict_array[:]
del images[:]
else:
break
使用mobilenet模型。
将捕获大小调整为1280 * 720,将捕获保存为文件,并对文件进行推理。
我做了以上所有事情,并在GTX1060(6GB)笔记本电脑上存档了12~16 FPS。
2018-06-04 13:27:03.381783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-06-04 13:27:03.381854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-04 13:27:03.381895: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-06-04 13:27:03.381933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-06-04 13:27:03.382069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5211 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1)
===TF inference took: 8.62651109695 for [100] images, average[0.0862651109695]===