我一直在尝试使用numpy从头开始创建一个3层(1个输入-2个隐藏-1个输出)神经网络。输出层只有1个神经元。我正在尝试对小批量产品使用随机梯度下降。到目前为止,我所做的如下:
N1 = 50 # number of neurons in the first hidden layer
N2 = 20 # number of neurons in the second hidden layer
for n in range(10000):
weight_increment2 = 0
weight_increment1 = 0
weight_increment0 = 0
for j in range( batch_size):
# choose a sample randomly from training set
index = np.random.randint( trainims.shape[2] - 1)
layer0 = trainims[:,:,index]
# convert it to a column vector and add 1 for bias
layer0 = layer0.flatten()
layer0 = np.append( layer0, 1)
layer0 = layer0.reshape((1025,1))
# feed forward
layer1 = np.tanh( np.dot( w0.T, layer0))
layer1 = np.append( layer1, 1)
layer1 = layer1.reshape((N1+1,1))
layer2 = np.tanh( np.dot( w1.T, layer1))
layer2 = np.append( layer2, 1)
layer2 = layer2.reshape((N2+1,1))
layer3 = math.tanh( np.dot( w2.T, layer2))
# backpropagation
layer3_error = trainlbls[0,index] - layer3
layer3_gradient = 1 - layer3 * layer3
layer3_delta = layer3_error * layer3_gradient
layer2_error = layer3_delta * w2
layer2_gradient = 1 - np.multiply(layer2, layer2)
layer2_delta = np.multiply(layer2_error, layer2_gradient)
strippedlayer2delta = layer2_delta[0:N2,:]
layer1_error = strippedlayer2delta.T * w1
layer1_gradient = 1 - np.multiply( layer1, layer1)
layer1_delta = np.multiply( layer1_error, layer1_gradient)
strippedlayer1delta = layer1_delta[0:N1,:]
weight_increment2 = weight_increment2 + learning_rate * layer2 * layer3_delta
weight_increment1 = weight_increment1 + learning_rate * np.dot( layer1, strippedlayer2delta.T)
weight_increment0 = weight_increment0 + learning_rate * np.dot( layer0, strippedlayer1delta.T)
# update the weights
w2 = w2 + weight_increment2
w1 = w1 + weight_increment1
w0 = w0 + weight_increment0
w0
是输入权重:1025x50(1024 =输入,+ 1是偏差项)w1
是第一个隐藏层权重:51x20(50 =第一个隐藏层神经元的数量,+ 1是偏项)w2
是第二个隐藏层权重:21x1(20 =第二个隐藏的说谎者神经元的数量,+ 1是偏置项)layer0
是输入层矢量:输入样本(1024x1),并且在末尾附加了1
,使layer0
成为1025x1矢量。layer1
是第一个隐藏层矢量:计算tanh( np.dot( w0.T, layer0))
之后,我在末尾添加1
,结果得到51x1矢量layer2
是第二个隐藏层向量:计算tanh( np.dot( w1.T, layer1))
之后,我在末尾加上1
,结果得到21x1向量layer3
是输出向量:tanh( np.dot( w2.T, layer2))
,仅是一个数字trainims
是尺寸为32x32x1900的训练集,表示1900张32x32图像在进行反向传播时,我省略了layer2_delta
和layer1_delta
的最后几行,但是我不完全知道为什么要这么做。我想这是因为省略了我附加到1
和layer1
的{{1}}项。
当我尝试运行代码时,出现以下错误:
layer2
我知道尺寸应该匹配以获得1025x50矩阵并更新ValueError Traceback (most recent call last)
<ipython-input-11-98c3beb3b346> in <module>()
62 print( "weight_increment1:", weight_increment1.shape)
63
---> 64 weight_increment0 = weight_increment0 + learning_rate * np.dot( layer0, strippedlayer1delta.T)
65 print( "weight_increment0:", weight_increment0.shape)
66
ValueError: shapes (1025,1) and (20,50) not aligned: 1 (dim 1) != 20 (dim 0)
,但我无法提出解决方案。也许我做错了反向传播,我不知道。有人可以为这个问题提出解决方案吗?
编辑: 好的,我注意到了我的错误。以前我在做
w0
给出矩阵,但是错误必须是层的向量,而不是矩阵。将此行更改为
layer1_error = strippedlayer2delta.T * w1
修复了形状错误。
但是,仍然有人可以检查我的小批量随机梯度下降和反向传播的实现是否正确吗?