Question

我有一个简单的结构，我从Siraj Raval的视频中了解到张量流中的单层感知器。我试图将它扩展到更多层，我遇到了困难。

第一个例子是2个输入和2个输出，其中权重和偏差被应用一次，然后softmax函数被应用于输出。

第二个例子是2个输入和2个输出，其间有一个隐藏层（2个单位），因此有两组权重和偏差，并且在每个权重和偏差之后应用softmax函数。

我试图将简单的案例扩展到N隐藏层案例，但是当我添加额外的图层时，它的成功有限，它们似乎被优化器忽略了。

输入格式为：

public interface XRepository<T, ID extends Serializable> extends JpaRepository {

    default Optional<T> xxx(ID id) {
        return Optional.ofNullable(this.findOne(id));
    }

}

输出标签的格式为：

inputX = np.array([[  2.10400000e+03,   3.00000000e+00],
                   [  1.60000000e+03,   3.00000000e+00],
                   [  2.40000000e+03,   3.00000000e+00],
                   [  1.41600000e+03,   2.00000000e+00],
                   [  3.00000000e+03,   4.00000000e+00],
                   [  1.98500000e+03,   4.00000000e+00],
                   [  1.53400000e+03,   3.00000000e+00],
                   [  1.42700000e+03,   3.00000000e+00],
                   [  1.38000000e+03,   3.00000000e+00],
                   [  1.49400000e+03,   3.00000000e+00]])

我的代码片段正确执行（依赖关系是numpy和tensorflow）：

inputY = np.array([[1, 0],
                   [1, 0],
                   [1, 0],
                   [0, 1],
                   [0, 1],
                   [1, 0],
                   [0, 1],
                   [1, 0],
                   [1, 0],
                   [1, 0]])

我得到了输出：

#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2]) 

#first layer weights and biases
W = tf.Variable(tf.zeros([2,2])) 
b = tf.Variable(tf.zeros([2])) 

# vector form of x*W + b
y_values = tf.add(tf.matmul(x, W), b)

#activation function
y = tf.nn.softmax(y_values) 

cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

for i in range(training_epochs):
    sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})

    #log training
    if i % display_step == 0:
        cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})

        print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))

print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "\nW=", sess.run(W), "\nb=", sess.run(b))


#check what it thinks when you give it the input data
print(sess.run(y, feed_dict = {x:inputX}))

不太好，所以我想尝试增加图层数量以确定它是否会改善。

我使用W2，b2和hidden_layer的新变量添加了一个额外的图层：

W= [[ 0.00021142 -0.00021142]
    [ 0.00120122 -0.00120122]] 

b=  [ 0.00103542 -0.00103542]

label_predictions = [[ 0.71073025  0.28926972]
                     [ 0.66503692  0.33496314]
                     [ 0.73576927  0.2642307 ]
                     [ 0.64694035  0.35305965]
                     [ 0.78248388  0.21751612]
                     [ 0.70078063  0.2992194 ]
                     [ 0.65879178  0.34120819]
                     [ 0.6485498   0.3514502 ]
                     [ 0.64400673  0.3559933 ]
                     [ 0.65497971  0.34502029]]

然后告诉我，我的第一层权重和偏差都是零，现在预测大约是每个训练样例的一半左右，比以前差很多。

输出：

#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2]) 

#first layer weights and biases
W = tf.Variable(tf.zeros([2,2])) 
b = tf.Variable(tf.zeros([2])) 

#second layer weights and biases
W2 = tf.Variable(tf.zeros([2,2]))
b2 = tf.Variable(tf.zeros([2]))

#flow through first layer
hidden_layer = tf.add(tf.matmul(x, W), b)
hidden_layer = tf.nn.softmax(hidden_layer)

#flow through second layer
y_values = tf.add(tf.matmul(hidden_layer, W2), b2)
y = tf.nn.softmax(y_values) 

cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

for i in range(training_epochs):
    sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})

    #log training
    if i % display_step == 0:
        cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})

        print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))

print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "\nW=", sess.run(W), "\nW2=", sess.run(W2),\
          "\nb=", sess.run(b), "\nb2=", sess.run(b2))


#check what it thinks when you give it the input data
print(sess.run(y, feed_dict = {x:inputX}))

为什么只有一层重量和偏差受到影响？为什么不添加改善模型的图层？

Answer 1

我提出了一些建议，以提高模型的性能：

1.）随机初始化的变量通常比零更好，至少对于矩阵元素。您可以尝试正常分布式变量。

2。）您应该对输入数据进行标准化，因为这两列具有不同的数量级。原则上，这应该不是问题，因为可以不同地调整权重，但是随机初始化，网络可能仅关注第一列。如果对数据进行标准化，则两列的数量级将相同。

3.）也许你应该将隐藏层中的神经元数量增加到大约10的值。

通过这些修改，它对我来说非常有效。我在下面发布了一个完整的工作示例：

import tensorflow as tf
import numpy as np
alpha = 0.02
training_epochs = 20000
display_step = 2000
inputX = np.array([[  2.10400000e+03,   3.00000000e+00],
                   [  1.60000000e+03,   3.00000000e+00],
                   [  2.40000000e+03,   3.00000000e+00],
                   [  1.41600000e+03,   2.00000000e+00],
                   [  3.00000000e+03,   4.00000000e+00],
                   [  1.98500000e+03,   4.00000000e+00],
                   [  1.53400000e+03,   3.00000000e+00],
                   [  1.42700000e+03,   3.00000000e+00],
                   [  1.38000000e+03,   3.00000000e+00],
                   [  1.49400000e+03,   3.00000000e+00]])
n_samples = inputX.shape[0]

# Normalize input data
means = np.mean(inputX, axis=0)
stddevs = np.std(inputX, axis=0)
inputX[:,0] = (inputX[:,0] - means[0]) / stddevs[0]
inputX[:,1] = (inputX[:,1] - means[1]) / stddevs[1]

# Define target labels
inputY = np.array([[1, 0],
                   [1, 0],
                   [1, 0],
                   [0, 1],
                   [0, 1],
                   [1, 0],
                   [0, 1],
                   [1, 0],
                   [1, 0],
                   [1, 0]])

#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2]) 

#first layer weights and biases
W = tf.Variable(tf.random_normal([2,10], stddev=1.0/tf.sqrt(2.0))) 
b = tf.Variable(tf.zeros([10])) 

#second layer weights and biases
W2 = tf.Variable(tf.random_normal([10,2], stddev=1.0/tf.sqrt(2.0)))
b2 = tf.Variable(tf.zeros([2]))

#flow through first layer
hidden_layer = tf.add(tf.matmul(x, W), b)
hidden_layer = tf.nn.softmax(hidden_layer)

#flow through second layer
y_values = tf.add(tf.matmul(hidden_layer, W2), b2)
y = tf.nn.softmax(y_values) 

cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

for i in range(training_epochs):
    sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})

    #log training
    if i % display_step == 0:
        cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})
        #check what it thinks when you give it the input data
        print(sess.run(y, feed_dict = {x:inputX}))

        print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))

print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "\nW=", sess.run(W), "\nW2=", sess.run(W2),\
          "\nb=", sess.run(b), "\nb2=", sess.run(b2))

输出看起来非常像标签：

[[  1.00000000e+00   2.48446125e-10]
 [  9.99883890e-01   1.16143732e-04]
 [  1.00000000e+00   2.48440435e-10]
 [  1.65703295e-05   9.99983430e-01]
 [  6.65045518e-05   9.99933481e-01]
 [  9.99985337e-01   1.46147468e-05]
 [  1.69444829e-04   9.99830484e-01]
 [  1.00000000e+00   6.85981003e-12]
 [  1.00000000e+00   2.05180339e-12]
 [  9.99865890e-01   1.34040893e-04]]

张量流中的多层感知器表现不尽如人意

1 个答案: