我试图将正则化添加到我的Mnist数字NN分类器中,我使用numpy和vanilla Python创建了它。我目前正在使用具有交叉熵成本函数的Sigmoid激活。
不使用正则化器,我的准确率达到97%。
然而,一旦我添加了正则化器,我只能得到大约11%,尽管使用不同的超参数。我尝试过不同的学习率:
.001,.1,1
和不同的lambd值,例如:
.5,.8,1.0,2.0等。
我似乎无法弄清楚我犯的是什么错误。我觉得我可能错过了一个步骤吗?
我所做的唯一改变是权重的衍生物。我已经实现了如下渐变:
def calculate_gradients(self,x, y, lambd):
'''calculate all gradients with respect to
cost. Here our cost function is cross_entropy
last_layer_z_error = dC/dZ (z is logit)
All weight gradients also include regularization gradients
x.shape[0] = len of sample size
'''
##### First we calculate the output layer gradients #########
gradients, activations, zs = self.gather_backprop_data(x,y)
#gradient of cost with respect to Z of last layer
last_layer_z_error = ((activations[-1] - y))
#updating the weight_derivatives of final layer
gradients['w'+ str(self.num_layers -1)] = np.dot(activations[-2].T,last_layer_z_error)/x.shape[0] + (lambd/x.shape[0])*(self.parameters['w'+ str(self.num_layers -1)])
gradients['b'+ str(self.num_layers -1)] = np.mean(last_layer_z_error, axis =0)
gradients['b'+ str(self.num_layers -1)] = np.expand_dims(gradients['b'+ str(self.num_layers -1)],0)
###HIDDEN LAYER GRADIENTS###
z_previous_layer = last_layer_z_error
for i in reversed(range(1,self.num_layers -1)):
z_previous_layer =np.dot(z_previous_layer,self.parameters['w'+ str(i+1)].T, )*\
(sigmoid_derivative(zs[i-1]))
gradients['w'+str(i)] = np.dot((activations[i-1].T),z_previous_layer)/x.shape[0] + (lambd/x.shape[0])*(self.parameters['w'+str(i)])
gradients['b'+str(i)] = np.mean(z_previous_layer, axis =0)
gradients['b'+str(i)] = np.expand_dims(gradients['b'+str(i)],0)
return gradients
整个代码可以在这里找到:
如果需要,我已将整个笔记本上传到Github: