我目前正在使用tensorflow,尝试为回归目的实现神经网络。回归包括将某个输入映射到输出。我的案例中的输入采样和帧音频文件,以及必须映射到的输出是每个帧应对应的一组MFCC特征。
输入当前存储如下。
#One audio set
[array([[frame],[frame],...,[frame]],dtype=float32)]
输出目前存储如下
[array([[Feature1, Feature2, Feature3,
Feature4, Feature5, Feature6,
Feature7, Feature8, Feature9,
Feature10, Feature11, Feature12,
Feature13],....,[...]])]
我目前正在尝试输入的模型是一个简单的线性模型。 但由于输入或输出数据不是一维数据集,因此我必须以某种方式提供它能够处理矢量大小。
# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
# Construct a linear model
pred = tf.add(tf.mul(X, W), b)
评估解决方案
一种解决方案是平坦输入和输出,并利用每个帧和特征向量长度一致的事实,并将权重W
赋予矩阵,其大小为[frame_length] ,feature_length],并将偏差的长度更改为feature_length的长度。
这是我尝试这样做的。
############################### Training setup ##################################
# Parameters
learning_rate = 0.01
training_epochs = 1000
display_step = 50
# tf Graph Input
X = tf.placeholder(tf.float32, [None])
Y = tf.placeholder(tf.float32, [None])
X_flatten = tf.reshape(X,[1,-1])
Y_flatten = tf.reshape(Y,[1,-1])
# Set model weights
W = tf.Variable(rng.randn(), name="weight")
W = tf.get_variable(name="W", shape=[train_set_data[0].shape[0],train_set_output[0].shape[0]])
b = tf.Variable(rng.randn(), name="bias")
b = tf.get_variable(name="b",shape=[1,train_set_output[0].shape[0]])
# Construct a linear model
pred = tf.add(tf.matmul(X, W), b)
# Mean squared error
cost = tf.nn.softmax(pred)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Fit all training data
for epoch in range(training_epochs):
for (x, y) in zip(train_set_data, train_set_output):
sess.run(optimizer, feed_dict={X: x, Y: y})
#Display logs per epoch step
if (epoch+1) % display_step == 0:
c = sess.run(cost, feed_dict={X: train_set_data, Y:train_set_output})
print "Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
"W=", sess.run(W), "b=", sess.run(b)
print "Optimization Finished!"
training_cost = sess.run(cost, feed_dict={X: train_set_data, Y: train_set_output})
print "Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n'
#Graphic display
plt.plot(train_set_data, train_set_output, 'ro', label='Original data')
plt.plot(train_set_data, sess.run(W) * train_set_data + sess.run(b), label='Fitted line')
plt.legend()
plt.show()
这里的问题是我收到一条错误消息,我不确定我理解..
Traceback (most recent call last):
File "tensorflow_datapreprocess_mfcc_extraction_rnn.py", line 177, in <module>
pred = tf.add(tf.matmul(X, W), b)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1036, in matmul
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 911, in _mat_mul
transpose_b=transpose_b, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2156, in create_op
set_shapes_for_outputs(ret)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1612, in set_shapes_for_outputs
shapes = shape_func(op)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/common_shapes.py", line 81, in matmul_shape
a_shape = op.inputs[0].get_shape().with_rank(2)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_shape.py", line 625, in with_rank
raise ValueError("Shape %s must have rank %d" % (self, rank))
ValueError: Shape (?,) must have rank 2
可以解释为什么我会收到此错误,或者为我提供的错误提供不同的解决方案。我不喜欢这个解决方案,因为我正在改变初始输入/输出结构,而不是使用先前预处理创建的结构。