Question

我已按照此处的示例进行操作：https://www.youtube.com/watch?v=MoMjIwGSFVQ并使用网络摄像头进行对象检测。

但是我已经将我的网络摄像头转换为使用来自IP摄像头的 rtsp 流，我认为该流是 H264 ，我现在注意到大约30秒在视频中滞后，加上视频有时会停止启动。

以下是执行主要处理的python代码：

import cv2
cap = cv2.VideoCapture("rtsp://192.168.200.1:5544/stream1")

# Running the tensorflow session
with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:
   ret = True
   while (ret):
      ret,image_np = cap.read()

      # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
      image_np_expanded = np.expand_dims(image_np, axis=0)
      image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')

      # Each box represents a part of the image where a particular object was detected.
      boxes = detection_graph.get_tensor_by_name('detection_boxes:0')

      # Each score represent how level of confidence for each of the objects.
      # Score is shown on the result image, together with the class label.
      scores = detection_graph.get_tensor_by_name('detection_scores:0')
      classes = detection_graph.get_tensor_by_name('detection_classes:0')
      num_detections = detection_graph.get_tensor_by_name('num_detections:0')

      # Actual detection.
      (boxes, scores, classes, num_detections) = sess.run(
      [boxes, scores, classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})

      # Visualization of the results of a detection.
      vis_util.visualize_boxes_and_labels_on_image_array(
          image_np,
          np.squeeze(boxes),
          np.squeeze(classes).astype(np.int32),
          np.squeeze(scores),
          category_index,
          use_normalized_coordinates=True,
          line_thickness=8)

#      plt.figure(figsize=IMAGE_SIZE)
#      plt.imshow(image_np)
      cv2.imshow('image',cv2.resize(image_np,(1280,960)))
      if cv2.waitKey(25) & 0xFF == ord('q'):
          cv2.destroyAllWindows()
          cap.release()
          break

我是python和tensorflow的新手。是否应以任何方式修改此代码以应对rtsp流？我的电脑没有GPU卡。

Answer 1

没有GPU Tensorflow无法以高fps处理高质量的帧。在我的机器上处理640 * 480帧花了将近0.2秒。所以它每秒可以处理大约5帧。

有两种方法可以让代码实时运行。

降低框架的分辨率
减少fps

代码

cap = cv2.VideoCapture("rtsp://192.168.200.1:5544/stream1")
cap.set(3,640) #set frame width
cap.set(4,480) #set frame height
cap.set(cv2.cv.CV_CAP_PROP_FPS, 5) #adjusting fps to 5

注意：即使在低分辨率下，Tensorflow对象检测也能很好地运行。

要体验GPU性能，floydhub 提供免费的GPU服务（限时）。您可以上传代码并在floydhub中运行并测量性能。我发现GPU比CPU快35倍。

Answer 2

如果1080p @ 30fps在您的网络摄像头上工作正常但不能在RTSP上工作，那么解码RTSP流的额外开销可能会过多地占用您的CPU。它无法同时执行您同时要求的任务。记忆也是瓶颈，但这似乎不太可能。

许多英特尔CPU都集成了能够本机解码视频的GPU。但是，我注意到在某些条件下，使用某些软件，原生解码选择CPU往往会滞后很多（大约30秒）。这也可能是你在这里遇到的问题。在具有相似质量但不完全相同的硬件的朋友的计算机上尝试使用该软件可能是值得的。您也可以在相同价格范围的新硬件上进行测试，因为我在最新一代英特尔CPU中没有看到这个问题。

Answer 3

对于USB网络摄像头和ipcameras，Opencv的read（）函数的工作方式有所不同。

在ipcameras上运行时，它不会读取最新的帧，但会读取最旧的（下一个）帧。

由于循环中的对象检测推断会占用大量时间，因此read（）很快就落后了，并且正在读取opencv缓冲区中最旧的可用帧。

一种解决方案是为相机启动一个线程，该线程读取帧并填充队列。然后，在另一个线程中，从此队列中读取帧，并对它们运行对象检测推断。

使用rtsp流时，Tensorflow对象检测速度很慢

3 个答案: