我正在研究Andrew Ng关于Coursera的ML课程,并且最近有机会将张量流应用于现实世界的环境,此时事情很快就会崩溃:D。
我的代码可以正常工作,因为它不会产生任何错误......但成本也不会改变。根据我的阅读和我的理解,这是因为渐变都是零。问题是,我不明白为什么会这样,以及如何解决它......
我正在研究的问题是一个回归模型,它被实现为一个浅层神经网络,其中包含10个单位的隐藏层,用于估算大型房产组合的每日销售额。
所以,代码:
def create_placeholders():
X = tf.placeholder(tf.float32, [13, None], name="X")
Y = tf.placeholder(tf.float32, [1, None], name="Y")
return X, Y
# Initialise weights and biases. Hidden layer of 10, output layer of 1
def initialise_parameters():
W1 = tf.get_variable("W1", [10, 13], initializer=tf.contrib.layers.xavier_initializer(seed=1))
b1 = tf.get_variable("b1", [10, 1], initializer=tf.zeros_initializer())
W2 = tf.get_variable("W2", [1, 10], initializer=tf.contrib.layers.xavier_initializer(seed=1))
b2 = tf.get_variable("b2", [1, 1], initializer=tf.zeros_initializer())
parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2}
return parameters
def forward_propagation(X, parameters):
# Retrieve the parameters from the dictionary "parameters"
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
Z1 = tf.add(tf.matmul(W1, X), b1)
A1 = tf.nn.relu(Z1)
Z2 = tf.add(tf.matmul(W2, A1), b2)
A2 = tf.nn.relu(Z2)
# Have a relu output because a) why not, and b) its sales data so negatives are nonsense.
return A2
# COST FUNCTION
def compute_cost(y_hat, y):
# Originally tried reduce_mean but gave same outcome of constant cost
cost = tf.reduce_sum(tf.square(y_hat - y))
return cost
# CHECK DIMENSIONS ARE ALL GOOD
[x.shape for x in [X_train, Y_train, X_test, Y_test]]
Out[48]: [(13, 331768), (1, 331768), (13, 41471), (1, 41471)]
# W1 shape = (10, 13), W1*X = (10,13) * (13, m) = (10, m)
# W2 shape = (1, 10), W2 * (W1*X) = (1,10) * (10, m) = (1,m)
# good
# X's have all been normalised using sklean.preprocessing.StandardScaler()
learning_rate = 0.0001
tf.reset_default_graph()
costs = [] # To keep track of the cost
X, Y = create_placeholders()
# Initialize parameters
parameters = initialise_parameters()
y_hat = forward_propagation(X, parameters)
cost = compute_cost(y_hat, Y)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)
# Alternative
# optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
init = tf.global_variables_initializer()
with tf.Session() as sess:
# Run the initialization
sess.run(init)
# Do the training loop
# Range is small here but have done up to 2000 and get same result.
for epoch in range(10):
_, epoch_cost = sess.run([optimizer, cost], feed_dict={X: X_train, Y: Y_train})
print(epoch_cost)
Output:
8.932417e+17
9.986363e+32
8.9324173e+17
8.9324173e+17
8.9324173e+17
8.9324173e+17
8.9324173e+17
8.9324173e+17
8.9324173e+17
8.9324173e+17
我在哪里错了?是什么导致梯度为零? 感谢
示例数据:
pd.DataFrame(X_train).iloc[:, :3]
Out[38]:
0 1 2
0 -2.639472 1.567537 0.000027
1 0.611774 0.377617 -0.317175
2 0.228070 0.095461 0.129151
3 0.098627 0.924450 -1.553019
4 0.428524 1.053880 -0.661481
5 -0.068319 1.084136 2.015653
6 0.788444 0.882707 -0.555907
7 0.450732 1.168514 -0.576461
8 0.438508 -0.281488 -0.338060
9 0.241477 -0.014153 -0.582811
10 0.248324 0.020088 -0.174162
11 0.007352 -1.261135 1.666470
12 0.134369 0.191141 0.235396
pd.DataFrame(y_train).iloc[:, :3]
Out[43]:
0 1 2
0 279637.377287 2.796952e+06 57194.231303
行表示(行=要素,列=观察值)为零
pd.DataFrame(X_train).mean(axis=1)
Out[41]:
0 5.572480e-12
1 2.818636e-14
2 -3.316111e-17
3 -2.587781e-12
4 3.080923e-13
5 1.078905e-13
6 -4.661424e-15
7 1.534782e-13
8 -9.438911e-13
9 2.328588e-14
10 -2.219427e-14
11 -9.047083e-15