加快TensorFlow对象检测推断

时间:2019-12-19 09:07:28

标签: python tensorflow deep-learning multiprocessing

当我发现每帧推理时间为1.2-2秒时,我开始在3600帧上运行自己的训练模型推理。所有过程大约花费了两个小时。我检查了性能,发现CPU使用率为10%-15%gpu使用为1%-5%,并且只有一个进程正在运行。我首先想到的是使用多重处理来加速整个过程,但是我遇到了错误。代码如下:

    # # Object Detection Image
# Welcome to the object detection inference walkthrough!
# This notebook will walk you step by step through the process of using a pre-trained model to detect objects in an image.
# Make sure to follow the [installation instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md) before you start.

# # Imports

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile

from distutils.version import StrictVersion
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image

import multiprocessing
import time

# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
from object_detection.utils import ops as utils_ops

if StrictVersion(tf.__version__) < StrictVersion('1.12.0'):
  raise ImportError('Please upgrade your TensorFlow installation to v1.12.*.')


# ## Env setup
# This is needed to display the images.
# get_ipython().run_line_magic('matplotlib', 'inline')


# ## Object detection imports
# Here are the imports from the object detection module.

from utils import label_map_util

from utils import visualization_utils as vis_util


# Model preparation

# ## Variables

# Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_FROZEN_GRAPH` to point to a new .pb file.

# By default we use an "SSD with Mobilenet" model here. See the [detection model zoo]
# (https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md)
# for a list of other models that can be run out-of-the-box with varying speeds and accuracies.

# What model to download.
MODEL_NAME = 'ssd_mobilenet_v0_walrus_13_12_2019'
MODEL_FILE = MODEL_NAME + '.tar.gz'
# DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
# PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')
PATH_TO_LABELS = os.path.join('data', 'ssd_mobilenet_v0_walrus_13_12_2019_label_map.pbtxt')

NUM_CLASSES = 5



# ## Download Model
# opener = urllib.request.URLopener()
# opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
# tar_file = tarfile.open(MODEL_FILE)
# for file in tar_file.getmembers():
#   file_name = os.path.basename(file.name)
#   if 'frozen_inference_graph.pb' in file_name:
#     tar_file.extract(file, os.getcwd())


# ## Load a (frozen) Tensorflow model into memory.

detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')


# ## Loading label map
# Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`.
# Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine

category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

# ## Helper code

def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

# # Detection

def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session() as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
      ]:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
              tensor_name)
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[1], image.shape[2])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: image})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])
      output_dict['detection_classes'] = output_dict[
          'detection_classes'][0].astype(np.int64)
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
      output_dict['detection_scores'] = output_dict['detection_scores'][0]
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict


# Size, in inches, of the output images.
# IMAGE_SIZE = (18, 12)

# for image_path in TEST_IMAGE_PATHS:
def detector(image_path):
    image = Image.open(image_path)
    # the array based representation of the image will be used later in order to prepare the
    # result image with boxes and labels on it.
    image_np = load_image_into_numpy_array(image)
    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    # Actual detection.
    output_dict = run_inference_for_single_image(image_np_expanded, detection_graph)
    # Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(
        image_np,
        output_dict['detection_boxes'],
        output_dict['detection_classes'],
        output_dict['detection_scores'],
        category_index,
        instance_masks=output_dict.get('detection_masks'),
        use_normalized_coordinates=True,
        line_thickness=8)
    # plt.figure(figsize=IMAGE_SIZE)
    # plt.imshow(image_np)

    #save to same folder as data input
    img = Image.fromarray(image_np)
    img.save(r'C:\temp\detected_frames\image{}.jpg'.format(image_path[len(image_path)-6:len(image_path)-4]))

# Load images:
PATH_TO_TEST_IMAGES_DIR = r'C:\temp\frames'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(17, 20) ]

n_cpu = 6
start_time = time.time()
if __name__ == '__main__':
    pool = multiprocessing.Pool(processes = n_cpu)
    pool.map(detector, TEST_IMAGE_PATHS, chunksize = 1)
    pool.close()
    elapsed_time = time.time() - start_time
    print('time with multiprocessing: ',elapsed_time)

这是错误:

    C:\Project Ready For Training\models\research\object_detection>python
     

image_mp_object_detection.py       警告:tensorflow:来自image_mp_object_detection.py:86:不建议使用名称tf.GraphDef。请改用tf.compat.v1.GraphDef。

WARNING:tensorflow:From image_mp_object_detection.py:87: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From C:\Project Ready For Training\models\research\object_detection\image_mp_object_detection.py:86:
     

不推荐使用名称tf.GraphDef。请使用tf.compat.v1.GraphDef   代替。

WARNING:tensorflow:From C:\Project Ready For Training\models\research\object_detection\image_mp_object_detection.py:86:
     

不推荐使用名称tf.GraphDef。请使用tf.compat.v1.GraphDef   代替。       警告:张量流:来自C:\ Project Ready for Training \ models \ research \ object_detection \ image_mp_object_detection.py:87:   不建议使用名称tf.gfile.GFile。请使用tf.io.gfile.GFile   代替。

WARNING:tensorflow:From C:\Project Ready For Training\models\research\object_detection\image_mp_object_detection.py:87:
     

不推荐使用名称tf.gfile.GFile。请使用tf.io.gfile.GFile   代替。

WARNING:tensorflow:From C:\Project Ready For Training\models\research\object_detection\image_mp_object_detection.py:110:
     

不推荐使用名称tf.Session。请使用tf.compat.v1.Session   代替。

2019-12-19 10:59:53.893730: I tensorflow/stream_executor/platform/default/dso_loader.cc:42]
     

成功打开动态库nvcuda.dll       警告:张量流:来自C:\ Project Ready for Training \模型\研究\ object_detection \ image_mp_object_detection.py:110:   不推荐使用名称tf.Session。请使用tf.compat.v1.Session   代替。

2019-12-19 10:59:53.898838: I tensorflow/stream_executor/platform/default/dso_loader.cc:42]
     

成功打开动态库nvcuda.dll       2019-12-19 21001:95-91:25-31.99 41705:8559 :: 5I3 .t9e4n7s5o8r5f:l oIw   /tceonrseo/rcfolmomwo/nc_orruen/tciommem/ognp_ur/ugnptui_mdee/vgipcue/.gcpcu:_1d6e4v0i]c   eF.ocucn:d1 6d4e0v] i cFeo u0n dw idtehv ipcreo p0e rwtiitehs:p r       onpaemret:i eGse:F o       rncaem eR:T XG e2F0o8r0c em aRjToXr:2 078 0m imnaojro:r:5 7m emmionroyrC:l o5c kmReamtoer(yGCHlzo)c:k R1a.t7e1(       GpHczi)B:u s1I.D7:1       0p0c0i0B:u0s1I:D0:0 .000       0200:1091-:1020-.109        21001:95-91:25-31.99 51708:7509 :: 5I3 .t9e5n8s4o7r5f:l ow / tsetnrseoarmf_leoxwe / csuttroera / mp_leaxtefcourtmo / rd / epflaautlfto / rdml / odpeefna_uclhte_cdkleorp_nt。   tGuPbU。 clci:b2r5a] r iGePsU alrieb rsatraiteisc aalrley   sltiantkiecda,l lsyk ilpi ndkleodp,e ns kcihpe cdkl.o       笔检查。       201290-1192--1129- 1190:1509 :: 5593:.5936.49968542:3 0I:ntseonrsfolrofwl / ocwo / rceo / rceo / mcmoomnm_ornu_nrtuinmtei / mgepu // ggppuu / _gdpeuv_idceev.iccce:.1c7   6A3d] d iAndgd ivnigs ivbilsei bglpeu gdpeuv idceevsi:c e0s       :0       2019-12-19 10:59:5230.1997-11228-51:9 I1 0t:e5n9s:o5r3f.l9o7w1 / 2c8o6r:e / Ip   ltaetnfsoorrmf / lcopwu / _cfoeraet / uprlea_tgfuoarrmd / .ccpcu:_1f4e2a] t   uYroeu_rg uCaPrUd .scucp:p1o4r2t] s Yionusrt rCuctionPsU tshuaptp   otrhtiss iTnesntsrourcFtlioown sb itnhaarty twhaiss nToetn   scoormFpliolwe db itnoa ru ew:a sA VnXo2t        编译使用:AVX2       2019-12-19 10:59:53.980383:2I0 1t9e-n1s2o-r1f9l o1w0 /:c5o9r:e5 / 3c.o9m8m0o8n7_8r:u nIt   itmensore / gpu / gpu_defvliocwe / .ccocr:e1 / 6c4o0m] m oFno_urnudn   tdeivmiec /例如p0u / wgiptuh_ dpervoipceer.tcice:s1640] F:o u       nnda mdee:v iGceeF o0r cwei tRhT Xp r2o0p8e0r tmiaejso:r:        n7a mmei:n oGre:F o5r cmee mRoTrXy C2l0o8c0k Rmaatjeo(rG:H z7):m i1n.o7r1:        p5c imBeumsoIrDy:C l0o0c0k0R:a0t1e:(0G0H.z0)       :2 011.97-11       2p-c1i9B u1s0I:D5:9:05030.09:80816:7060:。 0I        2t0e1n9s-o1r2f-l1o9w /1s0t:r5e9a:m5_3e.x9e9c0u2t1o0r:/ pl attefnosromr / fdleofwa / uslttr / edalmo_peexne_ccuhteocrk / eprl_asttfuobr 2。   lGtP / Ud lloipberna_rciheesc kaerre_ ssttuabt.iccca:l2l5y] lGiPnUk   Eldi,b rsakriipe sd laorpee ns tcheck。       atic2a0l1l9y-1l2i-n1k9e d1,0:s5k9i:p5 3d.l9o9p6e0n1 5c:h eIc kt.e       nso2r0f1l9o-w1 / 2c-o1r9e /1c0o:m5m9o:n5_3r.u9n9t7i4m6e9/:g pIu /tgepnus_odrefvliocwe/.ccocr:e1/7c6o3m]m oAnd_driunngt ivmies /   ggppuu_ ddeevviiccee.sc:c:01       763]添加可见的gpu设备:0       2019-12-19 10:59:55.272484:I tensorflow / core / common_runtime / gpu / gpu_device.cc:1181]设备   将StreamExecutor与强度1边缘矩阵互连:       2019-12-19 10:59:55.276416:I tensorflow / core / common_runtime / gpu / gpu_device.cc:1187] 0       2019-12-19 10:59:55.278405:I tensorflow / core / common_runtime / gpu / gpu_device.cc:1200] 0:N       2019-12-19 10:59:55.282195:I tensorflow / core / common_runtime / gpu / gpu_device.cc:1326]已创建   TensorFlow设备(/ job:localhost /副本:0 /任务:0 /设备:GPU:0与   6279 MB内存)-> 20 1p9h-y1s2i-c1a9l 1G0P:U5 9(:d5e5v.i2c8e6:1 307 ,:   nI mtee:n sGoerFfolrocwe / cRoTrXe / 2c0o8m0m,o np_criu nbtuism ei / dg:p   u0 / 0g0p0u:_0d1e:v0i0c.e0。,c cc:o1m1p8u1t] e Dceavpiacbei liinttye:r   c7o.n5n)e       ct具有强度1边缘矩阵的StreamExecutor:       2019-12-19 10:59:55.292241:I tensorflow / core / common_runtime / gpu / gpu_device.cc:1187] 0       2019-12-19 10:59:55.294398:我tensorflow / core / common_runtime / g警告:tensorflow:来自C:\ Project   准备好   Training \ models \ research \ object_detection \ image_mp_object_detection.py:112:   名称tf.get_default_graph已弃用。请用   改为使用tf.compat.v1.get_default_graph。       p       u / gpu_device.cc:1200] 0:N       2019-12-19 10:59:55.297425:I tensorflow / core / common_runtime / gpu / gpu_device.cc:1326]已创建   TensorFlow设备(/ job:localhost /副本:0 /任务:0 /设备:GPU:0与   6279 MB内存)->物理GPU(设备:0,名称:GeForce RTX 2080,   pci总线ID:0000:01:00.0,计算能力:7.5)       警告:张量流:来自C:\ Project Ready for Training \ models \ research \ object_detection \ image_mp_object_detection.py:112:   名称tf.get_default_graph已弃用。请用   改为使用tf.compat.v1.get_default_graph。

2019-12-19 10:59:59.496535: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create
     

cudnn句柄:CUDNN_STATUS_ALLOC_FAILED       2019-12-19 10:59:59.501378:E tensorflow / stream_executor / cuda / cuda_dnn.cc:329]无法创建   cudnn句柄:CUDNN_STATUS_ALLOC_FAILED       2019-12-19 10:59:59.589089:E tensorflow / stream_executor / cuda / cuda_dnn.cc:329]无法创建   cudnn句柄:CUDNN_STATUS_ALLOC_FAILED       2019-12-19 10:59:59.593281:E tensorflow / stream_executor / cuda / cuda_dnn.cc:329]无法创建   cudnn句柄:CUDNN_STATUS_ALLOC_FAILED       2019-12-19 11:00:00.641238:我tensorflow / core / common_runtime / gpu / gpu_device.cc:1640]找到了设备0   具有属性:       名称:GeForce RTX 2080主要版本:7次要版本:5 memoryClockRate(GHz):1.71       pciBusID:0000:01:00.0       2019-12-19 11:00:00.645818:我tensorflow / stream_executor / platform / default / dlopen_checker_stub.cc:25]   GPU库是静态链接的,请跳过dlopen检查。       2019-12-19 11:00:00.649715:我tensorflow / core / common_runtime / gpu / gpu_device.cc:1763]添加可见   gpu设备:0       2019-12-19 11:00:00.652491:我tensorflow / core / common_runtime / gpu / gpu_device.cc:1181]设备   将StreamExecutor与强度1边缘矩阵互连:       2019-12-19 11:00:00.655682:我tensorflow / core / common_runtime / gpu / gpu_device.cc:1187] 0       2019-12-19 11:00:00.657640:我tensorflow / core / common_runtime / gpu / gpu_device.cc:1200] 0:N       2019-12-19 11:00:00.660369:我tensorflow / core / common_runtime / gpu / gpu_device.cc:1326]已创建   TensorFlow设备(/ job:localhost /副本:0 /任务:0 /设备:GPU:0与   6279 MB内存)->物理GPU(设备:0,名称:GeForce RTX 2080,   pci总线ID:0000:01:00.0,计算能力:7.5)       2019-12-19 11:00:01.925959:E tensorflow / stream_executor / cuda / cuda_dnn.cc:329]无法创建   cudnn句柄:CUDNN_STATUS_ALLOC_FAILED       2019-12-19 11:00:01.930184:E tensorflow / stream_executor / cuda / cuda_dnn.cc:329]无法创建   cudnn句柄:CUDNN_STATUS_ALLOC_FAILED       追溯(最近一次通话):         文件“ image_mp_object_detection.py”,第193行,位于           pool.map(检测器,TEST_IMAGE_PATHS,chunksize = 1)         地图中的文件“ C:\ Users \ user \ Anaconda3 \ lib \ multiprocessing \ pool.py”,行266           返回self._map_async(func,可迭代,mapstar,chunksize).get()         获取文件“ C:\ Users \ user \ Anaconda3 \ lib \ multiprocessing \ pool.py”,行644           提高自我价值       multiprocessing.pool.MaybeEncodingError:发送结果错误:“。原因:'TypeError(“无法腌制_thread.RLock   对象”,)'

Tensorflow-gpu == 1.14 知道如何加快速度吗?

0 个答案:

没有答案