Tensorflow对象检测API RCNN在CPU上很慢:每分钟1帧

时间:2017-10-19 21:20:54

标签: tensorflow object-detection

我正在使用来自tensorflow对象检测API的本地训练模型。我正在使用faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017检查点。我重新训练了一个1级模型并将其导出到SavedModel

python object_detection/export_inference_graph.py \
    --input_type image_tensor \
    --pipeline_config_path ${PIPELINE_CONFIG_PATH} \
    --trained_checkpoint_prefix /Users/Ben/Dropbox/GoogleCloud/Detection/train/model.ckpt-186\
    --output_directory /Users/Ben/Dropbox/GoogleCloud/Detection/SavedModel/

虽然我知道还有其他较浅的模型,报告的RCNN are more than 100x faster运行时间比我看到的要多。任何人都可以使用他们在CPU上更快的RCNN运行时间来证实吗?我试图告诉我的代码是否存在问题,或者只是转移到较小的模型。

我正在抓取juypter notebook的代码,几乎没有变化。我正在运行一个干净的virtualenv,只有安装的要求。

detection_predict.py

import numpy as np
import tensorflow as tf
from PIL import Image
import glob
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
import os
import datetime

TEST_IMAGE_PATHS = glob.glob("/Users/Ben/Dropbox/GoogleCloud/Detection/images/validation/*.jpg")

# Size, in inches, of the output images. ?
IMAGE_SIZE = (12, 8)
NUM_CLASSES = 1

sess=tf.Session()
tf.saved_model.loader.load(sess,[tf.saved_model.tag_constants.SERVING], "/Users/ben/Dropbox/GoogleCloud/Detection/SavedModel/saved_model/")    

label_map = label_map_util.load_labelmap("label.pbtxt")
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

def load_image_into_numpy_array(image):
    (im_width, im_height) = image.size
    npdata=np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)   
    return npdata

# Definite input and output Tensors for sess.graph
image_tensor = sess.graph.get_tensor_by_name('image_tensor:0')

# Each box represents a part of the image where a particular object was detected.
detection_boxes = sess.graph.get_tensor_by_name('detection_boxes:0')

# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = sess.graph.get_tensor_by_name('detection_scores:0')
detection_classes = sess.graph.get_tensor_by_name('detection_classes:0')
num_detections = sess.graph.get_tensor_by_name('num_detections:0')
for image_path in TEST_IMAGE_PATHS:

    image = Image.open(image_path)

    #basewidth = 300
    #wpercent = (basewidth/float(image.size[0]))
    #hsize = int((float(image.size[1])*float(wpercent)))
    #image = image.resize((basewidth,hsize), Image.ANTIALIAS)

    # the array based representation of the image will be used later in order to prepare the
    # result image with boxes and labels on it.
    image_np = load_image_into_numpy_array(image)

    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    # Actual detection.
    before = datetime.datetime.now()    
    (boxes, scores, classes, num) = sess.run([detection_boxes, detection_scores, detection_classes, num_detections],feed_dict={image_tensor: image_np_expanded})
    print("Prediction took : " + str(datetime.datetime.now() - before))  

    # Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(image_np, np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), category_index, use_normalized_coordinates=True,line_thickness=8)
    plt.figure(figsize=IMAGE_SIZE)
    fn=os.path.basename(image_path)
    plt.imsave("/Users/Ben/Dropbox/GoogleCloud/Detection/validation/" + fn,image_np)

产量

(detection) Bens-MacBook-Pro:Detection ben$ python detection_predict.py 

Prediction took : 0:00:51.475269
Prediction took : 0:00:43.955962

调整图像大小没有任何区别(上面已注释掉)。它们并不庞大(1280 X 720)。

这是预期的吗?

系统信息

enter image description here

最新的Tensorflow版本

Bens-MacBook-Pro:Detection ben$ python
Python 2.7.10 (default, Feb  7 2017, 00:08:15) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'1.3.0'

编辑#1

如果有人想知道,从冻结推理图中预测没有任何区别。

detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile("/Users/ben/Dropbox/GoogleCloud/Detection/SavedModel/frozen_inference_graph.pb", 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')

(detection) Bens-MacBook-Pro:Detection ben$ python detection_predict.py 

Prediction took : 0:01:02.651046
Prediction took : 0:00:43.820992
Prediction took : 0:00:48.805432

cProfile并不是特别有启发性

>>> stats.print_stats(20)
Thu Oct 19 14:55:47 2017    profiling_results

         40742812 function calls (38600273 primitive calls) in 173.800 seconds

   Ordered by: internal time
   List reduced from 4918 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3  138.345   46.115  138.345   46.115 {_pywrap_tensorflow_internal.TF_Run}
977635/702731    2.852    0.000    9.200    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:469(init)
        3    2.597    0.866    2.597    0.866 {matplotlib._png.write_png}
    10719    2.111    0.000    2.114    0.000 {numpy.core.multiarray.array}
   363351    1.378    0.000    3.216    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:424(MakeSubMessageDefault)
  1045442    1.342    0.000    1.342    0.000 {_weakref.proxy}
562666/310637    1.317    0.000    6.182    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:1211(MergeFrom)
   931022    1.268    0.000    3.113    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:777(ListFields)
789671/269414    1.122    0.000    9.116    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:1008(ByteSize)
  1045442    0.882    0.000    2.498    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:1375(__init__)
3086143/3086140    0.662    0.000    0.756    0.000 {isinstance}
  1427511    0.656    0.000    0.782    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:762(_IsPresent)
   931092    0.649    0.000    0.879    0.000 {method 'sort' of 'list' objects}
1189105/899500    0.599    0.000    0.942    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:1330(Modified)
        1    0.537    0.537    0.537    0.537 {_pywrap_tensorflow_internal.TF_ExtendGraph}
276877/45671    0.480    0.000    8.315    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:1050(InternalSerialize)
  2602117    0.480    0.000    0.480    0.000 {method 'items' of 'dict' objects}
   459805    0.474    0.000    1.336    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/containers.py:551(__getitem__)
        1    0.434    0.434   16.605   16.605 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/tensorflow/python/framework/importer.py:156(import_graph_def)
  1297794    0.367    0.000    0.367    0.000 {method 'write' of '_io.BytesIO' objects}

编辑#2

在努力做到这一点之后,我开始怀疑那些报告时间更快的人并没有严格记录他们的环境。一些GPU检查点适合那些感兴趣的人。

https://github.com/tensorflow/models/issues/1715

我要打开这个问题,希望有人能为最大的模型报告他们的CPU时间,但我正在继续认为目前这是正确的,并转向较浅的模型。也许这将有助于其他人决定选择哪种模型。

6 个答案:

答案 0 :(得分:7)

希望这有助于其他用户选择型号。以下是我在OSX上报告3.1 Ghz CPU处理器的平均时间(以上更多信息)。

faster_rcnn_inception_resnet_v2_atrous_coco:45秒/图片

faster_rcnn_resnet101_coco:16秒/图片

fcn_resnet101_coco:7秒/图片

ssd_inception_v2_coco:0.3秒/图像

ssd_mobilenet_v1_coco:0.3秒/图像

答案 1 :(得分:0)

尝试Tensorflow Performance Guide(通用最佳实践和优化CPU)中的建议可能会有所帮助。具体来说,从源代码安装TF并更改输入管道似乎可以提高性能。

此外,Graph Transform Tool可能值得一试。

我自己没有尝试过上述内容,但对它们对性能的影响真的很感兴趣。

答案 2 :(得分:0)

在我的16GB RAM但2.5 GHz Intel Core i5上,只需检测部分:

  • ~5s / faster_rcnn_resnet101_coco_2018_01_28
  • 的图片
  • ~1s / image with ssd_mobilenet_v1_coco_2017_11_17

如果您循环浏览多个图像或运行视频中的帧,请注意为每个图像调用run_inference_for_single_image方法。您可能需要取出以下两行并将它们放在某处,以便仅调用一次。

with detection_graph.as_default():
    with tf.Session() as sess:

答案 3 :(得分:0)

如果使用Tensorflow Object Detection Jupyter Tutorial中的示例。缓慢的推理速度可能是由将图像对象转换为numpy对象的过程引起的。以下是一个证明这一点的示例:

import numpy as np
from PIL import Image
import time
def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.expand_dims(np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8),axis=0)
def load_image_into_numpy_array_updated(image):
  return np.expand_dims(np.array(image).astype(np.uint8),axis=0)
if __name__=='__main__':
  image = Image.open('xxx.JPEG')
  # original load method
  s= time.time()
  for _ in range(10):
    y = load_image_into_numpy_array(image)
  e= time.time()
  print('Execution Time of old load method {}'.format((e-s)/10))
 # updated load method
  s= time.time()
  for _ in range(10):
    y = load_image_into_numpy_array_updated(image)
  e= time.time()
  print('Execution Time of updated load method {}'.format((e-s)/10))

结果如下:

Execution Time of old load method 0.4671137571334839
Execution Time of updated load method 0.001219463348388672

外卖是np.array(image.getdata())非常慢。一种替代方法是将PIL图像对象直接输入到np.array()方法中,如我的代码示例所示。

另一个加快推理速度的技巧是将TF Session创建代码移出推理循环(为所有后续推理创建一次Session)。

PS:我测试中使用的图像尺寸为1280 * 720

答案 4 :(得分:0)

在Mac Book Pro上,使用faster_rcnn_resnet50_fgvc_2018_07_19处理单个图像需要8分钟。

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   2740/1    0.044    0.000  471.234  471.234 {built-in method builtins.exec}
        1    0.319    0.319  471.227  471.227 detect_insect.py:1(<module>)
        1    0.004    0.004  355.473  355.473 detect_insect.py:72(run_inference_for_single_image)
        1    0.001    0.001  352.112  352.112 session.py:846(run)
        1    0.002    0.002  352.111  352.111 session.py:1091(_run)
        1    0.000    0.000  352.096  352.096 session.py:1318(_do_run)
        1    0.000    0.000  352.096  352.096 session.py:1363(_do_call)
        1    0.001    0.001  352.096  352.096 session.py:1346(_run_fn)
        1    0.002    0.002  347.445  347.445 session.py:1439(_call_tf_sessionrun)
        1  347.443  347.443  347.443  347.443 {built-in method _pywrap_tensorflow_internal.TF_SessionRun_wrapper}
        1    0.441    0.441   56.288   56.288 request.py:1775(retrieve)

答案 5 :(得分:-1)

检查以下链接的回复 https://medium.com/@vaibhavsahu/hey-ben-3a2ff902303d

我使用的是nvidia GeForce GTX 1060 6GB GPU。但是,当您运行您的detection_predict.py(来自stackoverflow)时,每次在内存中加载模型都需要一些时间。在这种情况下的模型将是巨大的,我有180MB大小的模型。这就是为什么你必须在内存中加载模型一次并从加载的模型中每次检测。使用它只需要第一次的时间。以下检测将更快。你可以使用jupyter notebook做到这一点。 此外,每次检测使用with语句都会增加检测时间。在给定的笔记本中

with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:

将此更改为,

with detection_graph.as_default():
  sess = tf.Session(graph=detection_graph)

并放入不同的单元格,然后运行一次 然后每次在另一个单元格中进行检测

# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
for image_path in TEST_IMAGE_PATHS:
  image = Image.open(image_path)
  # the array based representation of the image will be used later in order to prepare the
  # result image with boxes and labels on it.
  image_np = load_image_into_numpy_array(image)
  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
  image_np_expanded = np.expand_dims(image_np, axis=0)
  # Actual detection.
  (boxes, scores, classes, num) = sess.run(
      [detection_boxes, detection_scores, detection_classes, num_detections],
      feed_dict={image_tensor: image_np_expanded})
  # Visualization of the results of a detection.
  vis_util.visualize_boxes_and_labels_on_image_array(
      image_np,
      np.squeeze(boxes),
      np.squeeze(classes).astype(np.int32),
      np.squeeze(scores),
      category_index,
      use_normalized_coordinates=True,
      line_thickness=8)
  plt.figure(figsize=IMAGE_SIZE)
  plt.imshow(image_np)

这应该会非常好地改善时间。