我目前面临一个奇怪的错误,就是花再训练的例子(https://www.tensorflow.org/versions/r0.9/how_tos/image_retraining/index.html)。
Tensorflow Release 0.9是从源码安装的,我尝试运行image_retraining python脚本(它确实启动并创建了一些瓶颈 但随后出现以下错误消息)。
可能有人知道问题可能是什么?我没有找到任何相似的帖子。
E tensorflow/core/kernels/check_numerics_op.cc:157] abnormal_detected_host @0x10007200300 = {1, 0} activation input is not finite.
Traceback (most recent call last):
File "examples/image_retraining/retrain.py", line 888, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "examples/image_retraining/retrain.py", line 798, in main
jpeg_data_tensor, bottleneck_tensor)
File "examples/image_retraining/retrain.py", line 456, in cache_bottlenecks
jpeg_data_tensor, bottleneck_tensor)
File "examples/image_retraining/retrain.py", line 414, in get_or_create_bottleneck
bottleneck_tensor)
File "examples/image_retraining/retrain.py", line 331, in run_bottleneck_on_image
{image_data_tensor: image_data})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 382, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 655, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 723, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 743, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: activation input is not finite. : Tensor had NaN values
[[Node: conv_1/CheckNumerics = CheckNumerics[T=DT_FLOAT, message="activation input is not finite.", _device="/job:localhost/replica:0/task:0/gpu:0"](conv_1/batchnorm)]]
Caused by op u'conv_1/CheckNumerics', defined at:
File "examples/image_retraining/retrain.py", line 888, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "examples/image_retraining/retrain.py", line 769, in main
create_inception_graph())
File "examples/image_retraining/retrain.py", line 312, in create_inception_graph
RESIZED_INPUT_TENSOR_NAME]))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py", line 274, in import_graph_def
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2297, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1231, in __init__
self._traceback = _extract_stack()
答案 0 :(得分:3)
更新:为了跟进,建议使用Tensorflow 1.6,因为许多操作要快得多。如果您运行的是Nvidia GPU,请确保安装CUDA 9.0而不安装9.1,9.1会破坏所有内容。
对于cuDNN,它需要匹配CUDA 9.0以及构建Tensorflow的版本。对于Tensorflow 1.6,请务必安装7.0.4版本,而不是7.1版本,以及1.6构建的特定版本(否则,它也会破坏): 对于CUDA 9.0(不是9.1),确切的版本是cuDNN v7.0.4.31-1。最新版本(此时为7.1.2)将抛出错误,因为Tensorflow 1.6是使用7.0.4构建的
原帖: 这是我在TensorFlow中遇到的一个错误(我在Ubuntu 14.04中使用2x GTX 1080)
一种选择是安装Cuda 8.0。但是,Cuda 8.0并未得到完全支持,您可能会遇到其他问题。
如果你只是试验,解决这个问题的另一种方法是构建它并仅在CPU上运行它,至少在瓶颈生成阶段是这样。
bazel build -c opt --copt=-mavx tensorflow/examples/image_retraining:retrain
bazel-bin/tensorflow/examples/image_retraining/retrain --image_dir ~/flower_photos
您可能知道,如果您已经构建了支持GPU的TensorFlow,那么运行它:
python tensorflow/examples/image_retraining/retrain.py --image_dir ~/flower_photos
它将在GPU支持下运行,然后你可能会遇到同样的错误。
我在这里开了一个问题: https://github.com/tensorflow/tensorflow/issues/3560
在他们修复之前,只要您没有大量要分类的类别,解决方法就会起作用。