我正在创建具有岭回归的随机梯度下降函数。我保持1800次迭代的步长不变,然后将其更改为1 / n或1 / sqrt(n)。当我使用1 / sqrt(n)时,我的损失正在减少并且几乎收敛。但是,当我使用1 / n时,我的损失会减少,然后开始增加!有人可以帮帮我吗。下面是SGC的代码,还有我用来在每次更新后计算整个批次损失的函数。
def stochastic_grad_descent_steptrial(x,y,thetha,alpha,num_iter,lambda_reg):
loss_log=[]
theta_log=[]
ridge_log=[]
total_loss_log=[] # total loss for updates(for plotting)
total_iter_count=[]
for j in range(num_iter): # j epochs
for i in range(x.shape[0]):
diff=np.dot(x.iloc[i,:],thetha)-y[i] # loss of each i
loss=np.sum(diff**2)+lambda_reg*np.sum(thetha**2)
loss_log.append(loss) # append to log
diff=np.dot(x.iloc[i,:],thetha)-y[i] # specific i
grad=(2/N)*np.dot(x.iloc[i,:].T,diff)+2*lambda_reg*thetha # grad for i
total_iter=((j+1)*(i+1)) # Total step number till now
total_iter_count.append(total_iter)
if total_iter<1800: # change to step size function only after n steps
step=alpha # Can use function and step condition as hyper parameter
else:
step=1/(total_iter)
# step=1/np.sqrt(total_iter)
thetha=thetha-step*grad #update
#appends
theta_log.append(thetha)
ridge_log.append(lambda_reg*np.sum(thetha**2))
# compute loss on entire data
total_loss=ridge_loss(x,y,thetha,lambda_reg)
total_loss_log.append(total_loss) # append
normal_loss=cost(x,y,thetha) #final loss(without ridge)
loss_log=np.array(loss_log) # conversions to np,pd
theta_log=pd.DataFrame(theta_log)
ridge_log=np.array(ridge_log)
return(loss_log,theta_log,ridge_log,thetha,normal_loss,total_loss_log)
我的损失函数(然后绘制)是:
def ridge_loss(x,y,thetha,lambda_reg):
diff=np.dot(x,thetha)-y
cost=(1/N)*np.sum(diff**2)+lambda_reg*np.sum(thetha**2)
return(cost)
为什么损失增加? (请参见下图)损耗如何增加,难道不应该只是缓慢地减少损失吗?
图像1具有恒定速率,图像2具有平方根迭代 数,图像3在迭代数上为1。