假设我有一个非常小的数据集,只有50个图像。我想重新使用Red Pill教程中的代码,但是在每批训练中对随机变换应用随机变换,比如对亮度,对比度等进行随机更改。我只添加了一个函数:< / p>
def preprocessImages(x):
retValue = numpy.empty_like(x)
for i in range(50):
image = x[i]
image = tf.reshape(image, [28,28,1])
image = tf.image.random_brightness(image, max_delta=63)
#image = tf.image.random_contrast(image, lower=0.2, upper=1.8)
# Subtract off the mean and divide by the variance of the pixels.
float_image = tf.image.per_image_whitening(image)
float_image_Mat = sess.run(float_image)
retValue[i] = float_image_Mat.reshape((28*28))
return retValue
旧代码的小改动:
batch = mnist.train.next_batch(50)
for i in range(1000):
#batch = mnist.train.next_batch(50)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x:preprocessImages(batch[0]), y_: batch[1], keep_prob: 1.0})
print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(feed_dict={x: preprocessImages(batch[0]), y_: batch[1], keep_prob: 0.5})
第一次迭代成功,之后崩溃:
step 0, training accuracy 0.02
W tensorflow/core/common_runtime/executor.cc:1027] 0x117e76c0 Compute status: Invalid argument: ReluGrad input is not finite. : Tensor had NaN values
[[Node: gradients_4/Relu_12_grad/Relu_12/CheckNumerics = CheckNumerics[T=DT_FLOAT, message="ReluGrad input is not finite.", _device="/job:localhost/replica:0/task:0/cpu:0"](add_16)]]
W tensorflow/core/common_runtime/executor.cc:1027] 0x117e76c0 Compute status: Invalid argument: ReluGrad input is not finite. : Tensor had NaN values
[[Node: gradients_4/Relu_13_grad/Relu_13/CheckNumerics = CheckNumerics[T=DT_FLOAT, message="ReluGrad input is not finite.", _device="/job:localhost/replica:0/task:0/cpu:0"](add_17)]]
W tensorflow/core/common_runtime/executor.cc:1027] 0x117e76c0 Compute status: Invalid argument: ReluGrad input is not finite. : Tensor had NaN values
[[Node: gradients_4/Relu_14_grad/Relu_14/CheckNumerics = CheckNumerics[T=DT_FLOAT, message="ReluGrad input is not finite.", _device="/job:localhost/replica:0/task:0/cpu:0"](add_18)]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/media/sf_Data/mnistConv.py", line 69, in <module>
train_step.run(feed_dict={x: preprocessImages(batch[0]), y_: batch[1], keep_prob: 0.5})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1267, in run
_run_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2763, in _run_using_default_session
session.run(operation, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 345, in run
results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 419, in _do_run
e.code)
tensorflow.python.framework.errors.InvalidArgumentError: ReluGrad input is not finite. : Tensor had NaN values
[[Node: gradients_4/Relu_12_grad/Relu_12/CheckNumerics = CheckNumerics[T=DT_FLOAT, message="ReluGrad input is not finite.", _device="/job:localhost/replica:0/task:0/cpu:0"](add_16)]]
Caused by op u'gradients_4/Relu_12_grad/Relu_12/CheckNumerics', defined at:
File "<stdin>", line 1, in <module>
File "/media/sf_Data/mnistConv.py", line 58, in <module>
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 165, in minimize
gate_gradients=gate_gradients)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 205, in compute_gradients
loss, var_list, gate_gradients=(gate_gradients == Optimizer.GATE_OP))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients.py", line 414, in gradients
in_grads = _AsList(grad_fn(op_wrapper, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_grad.py", line 107, in _ReluGrad
t = _VerifyTensor(op.inputs[0], op.name, "ReluGrad input is not finite.")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_grad.py", line 100, in _VerifyTensor
verify_input = array_ops.check_numerics(t, message=msg)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 48, in check_numerics
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 633, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1710, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 988, in __init__
self._traceback = _extract_stack()
...which was originally created as op u'Relu_12', defined at:
File "<stdin>", line 1, in <module>
File "/media/sf_Data/mnistConv.py", line 34, in <module>
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 506, in relu
return _op_def_lib.apply_op("Relu", features=features, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 633, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1710, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 988, in __init__
self._traceback = _extract_stack()
这与我使用50个培训示例的个人数据集得到的错误完全相同。
答案 0 :(得分:4)
首先要做的是:使用合并的tf.softmax_cross_entropy_with_logits
运算符,而不是计算y_conv然后计算交叉熵。这可能无法解决您的问题,但它比Red Pill示例中的天真版本更稳定。
其次,尝试在每次迭代时打印出cross_entropy。
cross_entropy = .... (previous code here)
cross_entropy = tf.Print(cross_entropy, [cross_entropy], "Cross-entropy: ")
了解它是否随着模型的进展而变为无穷大,或者只是跳到inf或NaN。如果它逐渐爆炸,那么它可能是学习率。如果它跳跃,它可能是一个可以如上解决的数值边界条件。如果它从一开始就在那里,你可能会在你应用扭曲的方式上出错,最终会以某种方式输入可怕的数据。