我有一个简单的结构,我从Siraj Raval的视频中了解到张量流中的单层感知器。我试图将它扩展到更多层,我遇到了困难。
第一个例子是2个输入和2个输出,其中权重和偏差被应用一次,然后softmax函数被应用于输出。
第二个例子是2个输入和2个输出,其间有一个隐藏层(2个单位),因此有两组权重和偏差,并且在每个权重和偏差之后应用softmax函数。
我试图将简单的案例扩展到N隐藏层案例,但是当我添加额外的图层时,它的成功有限,它们似乎被优化器忽略了。
输入格式为:
public interface XRepository<T, ID extends Serializable> extends JpaRepository {
default Optional<T> xxx(ID id) {
return Optional.ofNullable(this.findOne(id));
}
}
输出标签的格式为:
inputX = np.array([[ 2.10400000e+03, 3.00000000e+00],
[ 1.60000000e+03, 3.00000000e+00],
[ 2.40000000e+03, 3.00000000e+00],
[ 1.41600000e+03, 2.00000000e+00],
[ 3.00000000e+03, 4.00000000e+00],
[ 1.98500000e+03, 4.00000000e+00],
[ 1.53400000e+03, 3.00000000e+00],
[ 1.42700000e+03, 3.00000000e+00],
[ 1.38000000e+03, 3.00000000e+00],
[ 1.49400000e+03, 3.00000000e+00]])
我的代码片段正确执行(依赖关系是numpy和tensorflow):
inputY = np.array([[1, 0],
[1, 0],
[1, 0],
[0, 1],
[0, 1],
[1, 0],
[0, 1],
[1, 0],
[1, 0],
[1, 0]])
我得到了输出:
#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2])
#first layer weights and biases
W = tf.Variable(tf.zeros([2,2]))
b = tf.Variable(tf.zeros([2]))
# vector form of x*W + b
y_values = tf.add(tf.matmul(x, W), b)
#activation function
y = tf.nn.softmax(y_values)
cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(training_epochs):
sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})
#log training
if i % display_step == 0:
cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})
print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "\nW=", sess.run(W), "\nb=", sess.run(b))
#check what it thinks when you give it the input data
print(sess.run(y, feed_dict = {x:inputX}))
不太好,所以我想尝试增加图层数量以确定它是否会改善。
我使用W2,b2和hidden_layer的新变量添加了一个额外的图层:
W= [[ 0.00021142 -0.00021142]
[ 0.00120122 -0.00120122]]
b= [ 0.00103542 -0.00103542]
label_predictions = [[ 0.71073025 0.28926972]
[ 0.66503692 0.33496314]
[ 0.73576927 0.2642307 ]
[ 0.64694035 0.35305965]
[ 0.78248388 0.21751612]
[ 0.70078063 0.2992194 ]
[ 0.65879178 0.34120819]
[ 0.6485498 0.3514502 ]
[ 0.64400673 0.3559933 ]
[ 0.65497971 0.34502029]]
然后告诉我,我的第一层权重和偏差都是零,现在预测大约是每个训练样例的一半左右,比以前差很多。
输出:
#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2])
#first layer weights and biases
W = tf.Variable(tf.zeros([2,2]))
b = tf.Variable(tf.zeros([2]))
#second layer weights and biases
W2 = tf.Variable(tf.zeros([2,2]))
b2 = tf.Variable(tf.zeros([2]))
#flow through first layer
hidden_layer = tf.add(tf.matmul(x, W), b)
hidden_layer = tf.nn.softmax(hidden_layer)
#flow through second layer
y_values = tf.add(tf.matmul(hidden_layer, W2), b2)
y = tf.nn.softmax(y_values)
cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(training_epochs):
sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})
#log training
if i % display_step == 0:
cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})
print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "\nW=", sess.run(W), "\nW2=", sess.run(W2),\
"\nb=", sess.run(b), "\nb2=", sess.run(b2))
#check what it thinks when you give it the input data
print(sess.run(y, feed_dict = {x:inputX}))
为什么只有一层重量和偏差受到影响?为什么不添加改善模型的图层?
答案 0 :(得分:0)
我提出了一些建议,以提高模型的性能:
1.)随机初始化的变量通常比零更好,至少对于矩阵元素。您可以尝试正常分布式变量。
2。)您应该对输入数据进行标准化,因为这两列具有不同的数量级。原则上,这应该不是问题,因为可以不同地调整权重,但是随机初始化,网络可能仅关注第一列。如果对数据进行标准化,则两列的数量级将相同。
3.)也许你应该将隐藏层中的神经元数量增加到大约10的值。
通过这些修改,它对我来说非常有效。我在下面发布了一个完整的工作示例:
import tensorflow as tf
import numpy as np
alpha = 0.02
training_epochs = 20000
display_step = 2000
inputX = np.array([[ 2.10400000e+03, 3.00000000e+00],
[ 1.60000000e+03, 3.00000000e+00],
[ 2.40000000e+03, 3.00000000e+00],
[ 1.41600000e+03, 2.00000000e+00],
[ 3.00000000e+03, 4.00000000e+00],
[ 1.98500000e+03, 4.00000000e+00],
[ 1.53400000e+03, 3.00000000e+00],
[ 1.42700000e+03, 3.00000000e+00],
[ 1.38000000e+03, 3.00000000e+00],
[ 1.49400000e+03, 3.00000000e+00]])
n_samples = inputX.shape[0]
# Normalize input data
means = np.mean(inputX, axis=0)
stddevs = np.std(inputX, axis=0)
inputX[:,0] = (inputX[:,0] - means[0]) / stddevs[0]
inputX[:,1] = (inputX[:,1] - means[1]) / stddevs[1]
# Define target labels
inputY = np.array([[1, 0],
[1, 0],
[1, 0],
[0, 1],
[0, 1],
[1, 0],
[0, 1],
[1, 0],
[1, 0],
[1, 0]])
#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2])
#first layer weights and biases
W = tf.Variable(tf.random_normal([2,10], stddev=1.0/tf.sqrt(2.0)))
b = tf.Variable(tf.zeros([10]))
#second layer weights and biases
W2 = tf.Variable(tf.random_normal([10,2], stddev=1.0/tf.sqrt(2.0)))
b2 = tf.Variable(tf.zeros([2]))
#flow through first layer
hidden_layer = tf.add(tf.matmul(x, W), b)
hidden_layer = tf.nn.softmax(hidden_layer)
#flow through second layer
y_values = tf.add(tf.matmul(hidden_layer, W2), b2)
y = tf.nn.softmax(y_values)
cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(training_epochs):
sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})
#log training
if i % display_step == 0:
cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})
#check what it thinks when you give it the input data
print(sess.run(y, feed_dict = {x:inputX}))
print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "\nW=", sess.run(W), "\nW2=", sess.run(W2),\
"\nb=", sess.run(b), "\nb2=", sess.run(b2))
输出看起来非常像标签:
[[ 1.00000000e+00 2.48446125e-10]
[ 9.99883890e-01 1.16143732e-04]
[ 1.00000000e+00 2.48440435e-10]
[ 1.65703295e-05 9.99983430e-01]
[ 6.65045518e-05 9.99933481e-01]
[ 9.99985337e-01 1.46147468e-05]
[ 1.69444829e-04 9.99830484e-01]
[ 1.00000000e+00 6.85981003e-12]
[ 1.00000000e+00 2.05180339e-12]
[ 9.99865890e-01 1.34040893e-04]]