Question

我正在创建具有岭回归的随机梯度下降函数。我保持1700次迭代的步长不变，然后将其更改为1 / n或1 / sqrt（n）。当我使用1 / sqrt（n）时，我的损失正在减少并且几乎收敛。但是，当我使用1 / n时，损失会减小，然后开始增加！有人可以帮帮我吗。下面是SGC的代码，还有我用来在每次更新后计算整个批次损失的函数。

 def stochastic_grad_descent_step(x,y,thetha,alpha,num_iter,lambda_reg):  # Change step size as we go along

    loss_log=[]   #  loss  for each i
    theta_log=[]
    ridge_log=[]

    total_loss_log=[]  # total loss for updates(for plotting)
    for j in range(num_iter):
        for i in range(x.shape[0]):


            diff=np.dot(x.iloc[i,:],thetha)-y[i]  # specific to each i
            loss=np.sum(diff**2)+lambda_reg*np.sum(thetha**2)

            loss_log.append(loss)

            diff=np.dot(x.iloc[i,:],thetha)-y[i] # specific i
            grad=(2/N)*np.dot(x.iloc[i,:].T,diff)+2*lambda_reg*thetha # specific i

            total_iter=((j+1)*(i+1)) # Total step number till now 

            if total_iter<1800:        # change to step size function only after n steps 
                step=alpha             # Can use function and step condition as hyper parameter
            else:
                step=1/(total_iter-1700)
#                 step=1/np.sqrt(total_iter-1700)


            thetha=thetha-step*grad

            theta_log.append(thetha)
            ridge_log.append(lambda_reg*np.sum(thetha**2))  

            total_loss=ridge_loss(x,y,thetha,lambda_reg) #total loss 
            total_loss_log.append(total_loss)

    normal_loss=cost(x,y,thetha)
    print('Last Step Size + Iters: ',step,total_iter)
    loss_log=np.array(loss_log)
    theta_log=pd.DataFrame(theta_log)
    ridge_log=np.array(ridge_log)
    return(loss_log,theta_log,ridge_log,thetha,normal_loss,total_loss_log)

用于计算损耗的函数，然后将其绘制：

def ridge_loss(x,y,thetha,lambda_reg):

 diff=np.dot(x,thetha)-y
 cost=(1/N)*np.sum(diff**2)+lambda_reg*np.sum(thetha**2)
 return(cost)

click here for graph of 1/sqrt(n) click here to see graph for 1/n

请让我知道如何解决

随机梯度下降发散

0 个答案: