Question

我使用这个更快的rcnn： https://github.com/lev-kusanagi/Faster-RCNN_TF

演示运行正常并且有效。我正在进行这项项目，我将机器人的图像发送到预训练模型。大约15个图像发送后，我收到此错误：

/usr/local/lib/python2.7/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Initializing frcnn...
2018-06-09 12:46:20.343027: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-06-09 12:46:20.456905: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-06-09 12:46:20.457885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce 840M major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:03:00.0
totalMemory: 1.96GiB freeMemory: 1.84GiB
2018-06-09 12:46:20.457924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-06-09 12:46:25.081980: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-09 12:46:25.082022: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-06-09 12:46:25.082039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-06-09 12:46:25.082268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1605 MB memory) -> physical GPU (device: 0, name: GeForce 840M, pci bus id: 0000:03:00.0, compute capability: 5.0)
Tensor("Placeholder:0", shape=(?, ?, ?, 3), dtype=float32)
Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rpn_conv/3x3/rpn_conv/3x3:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rpn_cls_score/rpn_cls_score:0", shape=(?, ?, ?, 18), dtype=float32)
Tensor("rpn_cls_prob:0", shape=(?, ?, ?, ?), dtype=float32)
Tensor("rpn_cls_prob_reshape:0", shape=(?, ?, ?, 18), dtype=float32)
Tensor("rpn_bbox_pred/rpn_bbox_pred:0", shape=(?, ?, ?, 36), dtype=float32)
Tensor("Placeholder_1:0", shape=(?, 3), dtype=float32)
Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rois:0", shape=(?, 5), dtype=float32)
[<tf.Tensor 'conv5_3/conv5_3:0' shape=(?, ?, ?, 512) dtype=float32>, <tf.Tensor 'rois:0' shape=(?, 5) dtype=float32>]
Tensor("fc7/fc7:0", shape=(?, 4096), dtype=float32)


Loaded network VGGnet_fast_rcnn_iter_25000.ckpt
2018-06-09 12:46:41.637686: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.23GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-09 12:46:41.861576: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 791.02MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-09 12:46:42.118830: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.32GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-09 12:46:42.440887: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.09GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-09 12:46:42.635119: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.19GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-09 12:46:42.927540: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.59GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-09 12:46:43.155943: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 627.19MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-09 12:46:43.449477: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 848.25MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-09 12:46:43.780302: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 610.59MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Starting naoqi session...
2018-06-09 12:46:51.216501: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1,03GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Detection took 2.971s for 50 object proposals
Detection took 0.790s for 50 object proposals
Detection took 0.803s for 50 object proposals
Detection took 0.794s for 50 object proposals
Detection took 0.793s for 50 object proposals
Detection took 0.793s for 50 object proposals
Detection took 0.790s for 50 object proposals
Detection took 0.803s for 50 object proposals
Detection took 0.798s for 50 object proposals
Detection took 0.788s for 50 object proposals
Detection took 0.797s for 50 object proposals
Detection took 0.798s for 50 object proposals
Detection took 0.793s for 50 object proposals
Detection took 0.802s for 50 object proposals
Detection took 0.805s for 50 object proposals
Detection took 0.795s for 50 object proposals
Detection took 0.798s for 50 object proposals
out of memory
invalid argument
an illegal memory access was encountered
2018-06-09 12:47:51.523140: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:650] failed to record completion event; therefore, failed to create inter-stream dependency
2018-06-09 12:47:51.523143: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:650] failed to record completion event; therefore, failed to create inter-stream dependency
2018-06-09 12:47:51.541009: E tensorflow/stream_executor/stream.cc:309] Error recording event in stream: error recording CUDA event on stream 0x44bca20: CUDA_ERROR_ILLEGAL_ADDRESS; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2018-06-09 12:47:51.541009: I tensorflow/stream_executor/stream.cc:4737] stream 0x44bc950 did not memcpy host-to-device; source: 0x7f024c40f800
2018-06-09 12:47:51.541164: E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS
2018-06-09 12:47:51.541197: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:208] Unexpected Event status: 1
Aborted (core dumped)

除了获得更好的显卡外，还有解决方案吗？有没有办法在图像注释后释放内存或我的代码有问题，我应该在哪里寻找问题？

显卡信息：

Sat Jun  9 13:41:59 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 840M        Off  | 00000000:03:00.0 Off |                  N/A |
| N/A   42C    P5    N/A /  N/A |    164MiB /  2004MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

演示代码：https://pastebin.com/uny48BQG 运行此代码后发生相同的错误。我在文件夹中放了大约200张图片，但是在大约30张图片之后它会断开。

更快的rcnn在gpu上运行 - 内存不足

0 个答案: