Question

我正在阅读有关momentum的内容，我正试图在我的小批量代码中实现动量方程式。

问题是它没有工作，回归线离理想线太远了，我不确定实现是否正确。

def stochastic_gradient_descent_step(m,b,data_sample):

    n_points = data_sample.shape[0] #size of data
    m_grad = 0
    b_grad = 0
    stepper = 0.0001 #this is the learning rate
    z_m = 1.0
    z_b = 1.0
    betha = 0.81

    for i in range(n_points):

        #Get current pair (x,y)
        x = data_sample[i,0]
        y = data_sample[i,1]
        if(math.isnan(x)|math.isnan(y)): #it will prevent for crashing when some data is missing
            #print("is nan")
            continue

        #you will calculate the partical derivative for each value in data
        #Partial derivative respect 'm'
        dm = -((2/n_points) * x * (y - (m*x + b)))

        #Partial derivative respect 'b'
        db = - ((2/n_points) * (y - (m*x + b)))


        #Update gradient
        m_grad = m_grad + dm
        b_grad = b_grad + db

    #calculate the momentum
    z_m = betha*z_m + m_grad
    z_b = betha*z_b + b_grad
    #Set the new 'better' updated 'm' and 'b'   
    m_updated = m - stepper*z_m
    b_updated = b - stepper*z_b

返回m_updated，b_updated

被修改

我现在已经编辑了我的代码，因为Sasha建议我将梯度计算放在一个函数中，而将动量放在另一个函数中，我将z_m和z_b设置为全局，这样它们就不会在每次迭代中丢失它们的值。 / p>

z_m =0.0 #initilise to 0
z_b =0.0 #initilise to 0
def getGradient(m,b,data_sample):
    global z_m
    global z_b
    n_points = data_sample.shape[0] #size of data
    m_grad = 0
    b_grad = 0
    stepper = 0.0001 #this is the learning rate

    betha = 0.81

    for i in range(n_points):

        #Get current pair (x,y)
        x = data_sample[i,0]
        y = data_sample[i,1]
        if(math.isnan(x)|math.isnan(y)): #it will prevent for crashing when some data is missing
            #print("is nan")
            continue

        #you will calculate the partical derivative for each value in data
        #Partial derivative respect 'm'
        dm = -((2/n_points) * x * (y - (m*x + b)))

        #Partial derivative respect 'b'
        db = - ((2/n_points) * (y - (m*x + b)))


        #Update gradient
        m_grad = m_grad + dm
        b_grad = b_grad + db


    return m_grad,b_grad

def calculateMomentum(m_grad,b_grad,betha=0.81,stepper=0.0001):
    global z_m,z_b
    #calculate the momentum
    z_m = betha*z_m + m_grad
    z_b = betha*z_b + b_grad
    #Set the new 'better' updated 'm' and 'b'   
    m_updated = m - stepper*z_m
    b_updated = b - stepper*z_b
    return m_updated,b_updated

现在正确计算回归线（可能）。对于SGD，最终误差为59706304，动量最终误差为56729062，但可能是在计算梯度时选择随机小批量。

Answer 1

首先初始化无效，z_m和z_b应初始化为0（因为这是您对渐变的第一次猜测）。其次，在当前的函数形式中，您从未在下一次迭代中“存储”z_m或z_b，因此它们会被重置（无效值为1）

如何用python实现小批量梯度下降的动力？

1 个答案: