Question

我一直在关注使用YouTube视频中的Tensorflow和MNIST数据集上的Tensorflow CNN教程的卷积神经网络教程。我使用这些教程在音频数据上创建自己的CNN。目标是使用CNN识别33个发言者的声音。数据已经进行了争论，因此测试集的形状为（8404,1,500,1），因此可以应用卷积。每个音频段长500个，测试集中有8404个样本。我的问题是在训练步骤。我收到以下错误：

ValueError：无法为Tensor'占位符：0'提供形状值（128,1,500,1），其形状为'（？，500）'

我用Google搜索了这个ValueError，人们通过将batch_x重新塑造为预期的尺寸来解决这个问题。所以我尝试了以下代码：
batch_x = np.reshape(batch_x, [-1, 500])

我没有运气这个重塑。有没有人解决这个问题？下面是代码。

import numpy as np
import tensorflow as tf
npzfile = np.load('saved_data_file_33.npz')

train_segs = npzfile['train_segs']              # Seg train data
train_labels = npzfile['train_labels']          # train labels
train_labels_1h = npzfile['train_labels_1h']    # One hot encoding for training data
epochs = 1
batch_size = 128
learning_rate = 0.01
classes = len(train_labels_1h[0,:])  # 33 classes
seg_size = len(test_segs[0,0,:,0])   # 500 long

x = tf.placeholder(tf.float32, [None, seg_size])
y = tf.placeholder(tf.float32)

# This section is initializing the weights and biases of each hidden layer and output layer with random values.
# These values are stores in a dict for easy access.
weights = {"conv1" : tf.Variable(tf.random_normal([5, 5, 1, 32])),
            "conv2": tf.Variable(tf.random_normal([5, 5, 32, 64])),
            "fc_layer": tf.Variable(tf.random_normal([1*125*64, 1024])),
            "output": tf.Variable(tf.random_normal([1024, classes]))
            }
biases = { "b_c1" : tf.Variable(tf.random_normal([32])),
            "b_c2" : tf.Variable(tf.random_normal([64])),
            "b_fc" : tf.Variable(tf.random_normal([1024])),
            "output": tf.Variable(tf.random_normal([classes]))
            }

reshapedX = tf.reshape(x, [-1, 1, 500, 1])

conv1 = tf.nn.conv2d(reshapedX, weights["conv1"], strides = [1, 1, 1, 1], padding = "SAME")
conv1 = tf.nn.relu(conv1 + biases["b_c1"])
conv1 = tf.nn.max_pool(conv1, ksize = [1, 1, 2, 1], strides = [1, 1, 2, 1], padding = "SAME")

conv2 = tf.nn.conv2d(conv1, weights["conv2"], strides = [1, 1, 1, 1], padding = "SAME")
conv2 = tf.nn.relu(conv2 + biases["b_c2"])
conv2 = tf.nn.max_pool(conv2, ksize = [1, 1, 2, 1], strides = [1, 1, 2, 1], padding = "SAME")

fc = tf.reshape(conv2, [-1, 1*125*64])
fc = tf.nn.relu(tf.matmul(fc, weights["fc_layer"]) + biases["b_fc"])

output_layer = tf.matmul(fc, weights["output"]) + biases["output"]

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output_layer))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(output_layer, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  for i in range(epochs):
      j = 0
      while j < len(train_segs):
          start = i
          end = i + batch_size
          batch_x = np.array(train_segs[start:end])
          batch_y = np.array(train_labels[start:end])
          #batch_x = np.reshape(batch_x, [-1, 500]) # reshape for x input. s
          train_accuracy = accuracy.eval(feed_dict={x: batch_x, y: batch_y})
          print('step %d, training accuracy %g' % (i, train_accuracy))
          train_step.run(feed_dict={x: batch_x, y: batch_y})

  print('test accuracy %g' % accuracy.eval(feed_dict={
      x: train_segs, y: train_labels}))

Answer 1

您似乎要从train_segs中删除尺寸为1的尺寸。您可以使用train_segs = np.squeeze(train_segs)。

此外，我认为您使用np.reshape的错误括号，因此np.reshape(batch_x, (-1, 500))可能有效。通常，您需要小心使用reshape函数，因为元素的顺序可能不会以您期望的方式结束。

使用tensorflow在训练中塑造不正确

1 个答案: