ctc损失错误 - sequence_length(0)< = 3

时间:2018-02-20 16:24:17

标签: python tensorflow convolution

我面临的错误类似于此处所述的错误CTC Loss InvalidArgumentError: sequence_length(b) <= time,但似乎没有解释错误的实际含义。基于我所做的读数,它是否意味着序列长度 小批量中的例子“0”小于3?在这种情况下,为什么它是一个错误(因为如在文件和上面的问题中所解释的,所有序列的长度必须小于时间,对吧?)..任何人都可以解释我如何调试问题并使感觉错误?我正在使用现有的conv2d示例并尝试使用我有的一些音频文件合并ctc损失

代码存在于此处https://github.com/takingstock/ServerSide-Algos/blob/master/ctc-conv.py,问题出现在第213行(为粘贴代码github url而不是代码而道歉......我觉得这样可能更干净

堆栈跟踪

Caused by op u'CTCLoss', defined at:
File "conv_train.py", line 279, in <module>
loss = tf.nn.ctc_loss(Y , logits, seq_len)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/ctc_ops.py", line 156, in ctc_loss
ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_ctc_ops.py", line 224, in _ctc_loss
name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1625, in __init__
self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): sequence_length(0) <= 3
     [[Node: CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](transpose, _arg_Placeholder_3_0_3, _arg_Placeholder_2_0_2, _arg_Placeholder_4_0_4)]]

1 个答案:

答案 0 :(得分:1)

证明错误是因为我将输入输入ctc_loss函数的方式。 logits应该是[max_timestep,batch_size,num_classes / labels]的形状,但我是以相反的方式发送它。请查看以下网址中的更新代码..希望它可能对某些人有用。

https://github.com/takingstock/ServerSide-Algos/blob/master/ctc_conv_corrected.py

准确地说,这是创建问题的代码的一部分

conv1 = conv2d(x, weights['wc1'], biases['bc1'])
# Max Pooling (down-sampling)
conv1 = maxpool2d(conv1, k=2)

# Convolution Layer
conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
# Max Pooling (down-sampling)
conv2 = maxpool2d(conv2, k=2)
# Fully connected layer
# Reshape conv2 output to fit fully connected layer input
fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
fc1 = tf.nn.relu(fc1)
# Apply Dropout
fc1 = tf.nn.dropout(fc1, dropout)

如果您注意到,添加池会减少需要输入到ctc_loss的数据的维度。另外,根据我的个人经验(以及我读过的相当多的文献)汇集并没有太大的好处(至少不是非图像卷积)因此我用

代替了上面的内容。
x = tf.reshape(X, shape=[-1, num_features, 399 , 1])
# Convolution Layer
conv1 = conv2d(conv1, weights['wc1'], biases['bc1'], 1)
fc1 = tf.reshape(conv1, [batch_size , 399 , 
weights['wd1'].get_shape().as_list()[0]])
fc1 = tf.layers.dense( fc1, 1024 , activation=tf.nn.relu)
# Apply Dropout
fc1 = tf.nn.dropout(fc1, keep_prob)
# Output, class prediction
logits = tf.layers.dense(inputs=fc1, units=num_classes, activation=tf.nn.relu)

logits = tf.transpose(logits, (1, 0, 2))

loss = tf.nn.ctc_loss(Y , logits, seq_len)

这样,进入ctc_loss的输入具有确切的[max_ts,batch,label]格式。仅使用1层conv的结果也优于BiRNN(我的数据为**) ..此后这个帖子被证明是非常直观的帮助(使用ctc_loss的卷积) How to use tf.nn.ctc_loss in cnn+ctc network