我使用python
编写了一个简单的神经网络来测试我对神经网络的理解。我的问题是神经网络损失在训练时不会减少。我使用最小和方作为损失函数。有神经网络经验的人可以告诉我哪里出错了。
详细信息:
我正在尝试在简单的2d数据集上训练网络,其中每个点都是两个类别之一的一部分。我的数据集就像tensorflow提供的这个交互式神经网络工具中使用的数据集:
我正在使用一个带有三个隐藏单元的隐藏层,以及一个带有一个单元的输出层。损失函数是平方和。我使用以下来源来帮助我理解反向传播的工作原理:
http://cs231n.stanford.edu/vecDerivs.pdf http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf
import numpy as np
import matplotlib.pyplot as plt
data_x = np.array([np.array([3,2,1,1,2,3]), np.array([4,6,6,1,1,1])])
data_x = data_x.transpose()
y_data = np.array([0,0,0,1,1,1]) # 0 is blue #1 is red
weights = []
layers_hidden_units = [3,1]
caches = [0] * (len(layers_hidden_units) + 1) # list holding ai values and back propagation values
caches[0] = data_x
zs = [0] * len(layers_hidden_units) # list holding z values
bs = []
learning_steps = 10000
learning_rate = .001
def init_weights():
for i in range(len(layers_hidden_units)):
if (i == 0):
bs.append(np.zeros(layers_hidden_units[0]))
weights.append(np.random.rand(data_x.shape[1], layers_hidden_units[0]))
else:
bs.append(np.zeros(layers_hidden_units[i]))
weights.append(np.random.rand(layers_hidden_units[i - 1], layers_hidden_units[i]))
init_weights()
def calculate_z(a_prev, W, b):
Z = np.dot(a_prev, W) + b;
return Z #shape is (m, a)
def sigmoid(z):
sigmoid = 1 / (1 + np.exp(-z))
return sigmoid
def sum_of_squares_loss(y_hat):
return np.sum(np.power(y_hat - y_data, 2))
def forward_propagation_step(a_prev, W, b, step):
z_curr = np.dot(a_prev, W) + b
a_curr = sigmoid(z_curr)
zs[step] = z_curr
caches[step + 1] = a_curr
return a_curr
def forward_propagate(data_x):
a = data_x
for i in range(len(layers_hidden_units)):
a = forward_propagation_step(a, weights[i], bs[i], i)
print("sum of squares loss: " + str(sum_of_squares_loss(a)))
return a;
def backpropagate_step2(z_prev, W, a_prev, back_value, b):
dz = np.multiply(sigmoid(z_prev) * (1 - sigmoid(z_prev)), back_value)
back = np.dot(dz, W.transpose())
w, b = gradient_descent2(a_prev, z_prev, dz, back_value, W, b)
return (w, b, back)
def gradient_descent2(a_prev, z_prev, dz, back_value, W, b):
dW = np.dot(a_prev.transpose(), dz)
db = dz
W = W - learning_rate * dW
b = b - learning_rate * db
return (W, b)
def backpropagate2():
a_last = caches[len(caches) - 1]
z_last = zs[len(zs) - 1]
back = 2 * np.sum(y_data - a_last)
for i in range(len(caches) - 2, -1, -1):
W, b, back = backpropagate_step2(zs[i], weights[i], caches[i], back, bs[i])
weights[i] = W
bs[i] = b
def train():
for i in range(learning_steps):
forward_propagate(data_x)
backpropagate2()
def predict():
x_value = np.array([np.array([1]), np.array([5])])
x_value = x_value.transpose()
return forward_propagate(x_value)
train()
prediction = predict()
print("prediction: " + str(prediction))
print("weights: " + str(weights))
print("b's: " + str(bs))
修改
每个训练步骤输出损失函数:
sum of squares loss: 9.41904563854931
sum of squares loss: 9.466209959933774
sum of squares loss: 9.521526062849716
sum of squares loss: 9.586899865148004
sum of squares loss: 9.664794367157919
sum of squares loss: 9.758420389666265
sum of squares loss: 9.871998308092504
sum of squares loss: 10.01111579994052
sum of squares loss: 10.183210868157662
sum of squares loss: 10.398206323346686
sum of squares loss: 10.669295283053922
sum of squares loss: 11.01378465075701
sum of squares loss: 11.453638064297865
sum of squares loss: 12.014673540964441
sum of squares loss: 12.72180795275546
sum of squares loss: 13.585099304548
sum of squares loss: 14.571325340888313
sum of squares loss: 15.575875071643217
sum of squares loss: 16.450858359675404
sum of squares loss: 17.09492556599039
sum of squares loss: 17.50379115136118
sum of squares loss: 17.737928687252197
sum of squares loss: 17.86441528900727
sum of squares loss: 17.93065857787634
sum of squares loss: 17.964762961628182
sum of squares loss: 17.982154959175745
sum of squares loss: 17.99097886266222
sum of squares loss: 17.995443731332028
sum of squares loss: 17.997699846969052
sum of squares loss: 17.998839078868475
sum of squares loss: 17.99941413509295
sum of squares loss: 17.999704357817745
sum of squares loss: 17.999850815993767
sum of squares loss: 17.999924721397637
sum of squares loss: 17.99996201452985
sum of squares loss: 17.999980832662573
sum of squares loss: 17.99999032824638
sum of squares loss: 17.999995119680737
sum of squares loss: 17.999997537416142
sum of squares loss: 17.999998757393232
sum of squares loss: 17.999999372987293
sum of squares loss: 17.99999968361277
sum of squares loss: 17.999999840352714
sum of squares loss: 17.999999919442843
sum of squares loss: 17.9999999593513
sum of squares loss: 17.999999979488887
sum of squares loss: 17.999999989650206
sum of squares loss: 17.99999999477755
sum of squares loss: 17.99999999736478
sum of squares loss: 17.999999998670287
sum of squares loss: 17.99999999932903
sum of squares loss: 17.999999999661433
sum of squares loss: 17.999999999829164
sum of squares loss: 17.999999999913797
sum of squares loss: 17.999999999956504
sum of squares loss: 17.99999999997805
sum of squares loss: 17.999999999988926
sum of squares loss: 17.99999999999441
sum of squares loss: 17.99999999999718
sum of squares loss: 17.99999999999858
sum of squares loss: 17.999999999999282
sum of squares loss: 17.999999999999638
sum of squares loss: 17.999999999999815
sum of squares loss: 17.999999999999908
sum of squares loss: 17.99999999999995
sum of squares loss: 17.99999999999998
sum of squares loss: 17.999999999999993
sum of squares loss: 17.999999999999993
sum of squares loss: 18.0
sum of squares loss: 18.0
sum of squares loss: 18.0
sum of squares loss: 18.0
sum of squares loss: 18.0
sum of squares loss: 18.0
修改2
输入数据:
3,4 -> 0
2,6 -> 0
1,6 -> 0
1,1 -> 1
2,1 -> 1
3,1 -> 1
答案 0 :(得分:0)
错误似乎在此行的函数backpropagate_step2中:
dz = np.multiply(sigmoid(z_prev) - (1 - sigmoid(z_prev)), back_value)
应该是
dz = np.multiply(sigmoid(z_prev)*(1 - sigmoid(z_prev)), back_value)
因为sigmoid(x)的导数是sigmoid(x)*(1-sigmoid(x))而不是sigmoid(x) - (1-sigmoid(x))。