我正在使用TiTAN在docker中运行tensorflow object_detection api。使用命令python object_detection/model_main.py --"pipeline_config_path object_detection/train_manhole/faster_rcnn_resnet101_coco.config --model_dir object_detection/train_manhole --alsologtostder
,我收到一个错误。
以下是错误信息:
root@a358c8644e9c:~/manhole/models/research# python object_detection/model_main.py --pipeline_config_path object_detection/train_manhole/faster_rcnn_resnet101_coco.config --model_dir object_detection/train_manhole --alsologtostder
/root/manhole/models/research/object_detection/utils/visualization_utils.py:26: UserWarning:
This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.
The backend was *originally* set to 'TkAgg' by the following code:
File "object_detection/model_main.py", line 26, in <module>
from object_detection import model_lib
File "/root/manhole/models/research/object_detection/model_lib.py", line 27, in <module>
from object_detection import eval_util
File "/root/manhole/models/research/object_detection/eval_util.py", line 28, in <module>
from object_detection.metrics import coco_evaluation
File "/root/manhole/models/research/object_detection/metrics/coco_evaluation.py", line 20, in <module>
from object_detection.metrics import coco_tools
File "/root/manhole/models/research/object_detection/metrics/coco_tools.py", line 47, in <module>
from pycocotools import coco
File "/root/manhole/models/research/pycocotools/coco.py", line 49, in <module>
import matplotlib.pyplot as plt
File "/usr/local/lib/python3.5/dist-packages/matplotlib/pyplot.py", line 71, in <module>
from matplotlib.backends import pylab_setup
File "/usr/local/lib/python3.5/dist-packages/matplotlib/backends/__init__.py", line 16, in <module>
line for line in traceback.format_stack()
import matplotlib; matplotlib.use('Agg') # pylint: disable=multiple-statements
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
WARNING:tensorflow:Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f0a65a25e18>) includes params argument, but params are not passed to Estimator.
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /root/manhole/models/research/object_detection/builders/dataset_builder.py:80: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /root/manhole/models/research/object_detection/builders/dataset_builder.py:152: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.batch(..., drop_remainder=True)`.
WARNING:tensorflow:From /root/manhole/models/research/object_detection/predictors/heads/box_head.py:93: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /root/manhole/models/research/object_detection/core/losses.py:345: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
See `tf.nn.softmax_cross_entropy_with_logits_v2`.
/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2019-03-29 02:38:24.469848: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-03-29 02:38:24.546573: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-03-29 02:38:24.547117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 11.91GiB freeMemory: 11.41GiB
2019-03-29 02:38:24.547150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-03-29 02:38:24.709319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-29 02:38:24.709370: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-03-29 02:38:24.709377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-03-29 02:38:24.709670: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11036 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:01:00.0, compute capability: 6.1)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input shape axis 0 must equal 4, got shape [5]
[[{{node Preprocessor/ResizeToRange/cond/resize_images/unstack}} = Unpack[T=DT_INT32, axis=0, num=4, _device="/device:CPU:0"](Preprocessor/ResizeToRange/cond/resize_images/Shape)]]
[[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[[1], [1,?,?,3], [1,2], [1,3], [1,100], [1,100,4], [1,100,2], [1,100,2], [1,100], [1,100], [1,100], [1]], output_types=[DT_INT32, DT_FLOAT, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_BOOL, DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorV2)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "object_detection/model_main.py", line 109, in <module>
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "object_detection/model_main.py", line 105, in main
tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate
return executor.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/training.py", line 610, in run
return self.run_local()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/training.py", line 711, in run_local
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_model_default
saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1471, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 671, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1156, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python3.5/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1240, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1312, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1076, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input shape axis 0 must equal 4, got shape [5]
[[{{node Preprocessor/ResizeToRange/cond/resize_images/unstack}} = Unpack[T=DT_INT32, axis=0, num=4, _device="/device:CPU:0"](Preprocessor/ResizeToRange/cond/resize_images/Shape)]]
[[node IteratorGetNext (defined at object_detection/model_main.py:105) = IteratorGetNext[output_shapes=[[1], [1,?,?,3], [1,2], [1,3], [1,100], [1,100,4], [1,100,2], [1,100,2], [1,100], [1,100], [1,100], [1]], output_types=[DT_INT32, DT_FLOAT, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_BOOL, DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorV2)]]
我的Docker环境是:
== cat /etc/issue ===============================================
Linux a358c8644e9c 4.15.0-46-generic #49~16.04.1-Ubuntu SMP Tue Feb 12 17:45:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
VERSION="16.04.6 LTS (Xenial Xerus)"
VERSION_ID="16.04"
VERSION_CODENAME=xenial
== are we in docker =============================================
Yes
== compiler =====================================================
c++ (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
== uname -a =====================================================
Linux a358c8644e9c 4.15.0-46-generic #49~16.04.1-Ubuntu SMP Tue Feb 12 17:45:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
== check pips ===================================================
numpy 1.14.2
protobuf 3.7.1
tensorflow-estimator 1.13.0
tensorflow-gpu 1.12.0
== check for virtualenv =========================================
False
== tensorflow import ============================================
tf.VERSION = 1.12.0
tf.GIT_VERSION = v1.12.0-0-ga6d8ffae09
tf.COMPILER_VERSION = 4.8.5
Sanity check: array([1], dtype=int32)
== env ==========================================================
LD_LIBRARY_PATH /usr/local/cuda-9.0/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
DYLD_LIBRARY_PATH is unset
== nvidia-smi ===================================================
Fri Mar 29 02:57:43 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.78 Driver Version: 410.78 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:01:00.0 On | N/A |
| 23% 36C P0 69W / 250W | 378MiB / 12192MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
== cuda libs ===================================================
/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudart.so.9.0.176
/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudart_static.a
感谢帮助!
答案 0 :(得分:0)
谢谢!我修好了它。原因是这些是一些图片破碎的数据集。而且,某些图片的深度超过3。我使用python脚本选择了导致上述错误的图片。最后,它可以正常运行。
答案 1 :(得分:0)
我遇到了同样的问题,问题是因为数据集(数据集中的图像损坏)。数据集中存在不正确的图像形状,如 2D 图像形状,正确的形状应该是 3D 形状。
解决的是通过比较图像的形状来删除非 3D 图像。完成!!!!