我正在阅读有关momentum的内容,我正试图在我的小批量代码中实现动量方程式。
问题是它没有工作,回归线离理想线太远了,我不确定实现是否正确。
def stochastic_gradient_descent_step(m,b,data_sample):
n_points = data_sample.shape[0] #size of data
m_grad = 0
b_grad = 0
stepper = 0.0001 #this is the learning rate
z_m = 1.0
z_b = 1.0
betha = 0.81
for i in range(n_points):
#Get current pair (x,y)
x = data_sample[i,0]
y = data_sample[i,1]
if(math.isnan(x)|math.isnan(y)): #it will prevent for crashing when some data is missing
#print("is nan")
continue
#you will calculate the partical derivative for each value in data
#Partial derivative respect 'm'
dm = -((2/n_points) * x * (y - (m*x + b)))
#Partial derivative respect 'b'
db = - ((2/n_points) * (y - (m*x + b)))
#Update gradient
m_grad = m_grad + dm
b_grad = b_grad + db
#calculate the momentum
z_m = betha*z_m + m_grad
z_b = betha*z_b + b_grad
#Set the new 'better' updated 'm' and 'b'
m_updated = m - stepper*z_m
b_updated = b - stepper*z_b
返回m_updated,b_updated
被修改
我现在已经编辑了我的代码,因为Sasha建议我将梯度计算放在一个函数中,而将动量放在另一个函数中,我将z_m和z_b设置为全局,这样它们就不会在每次迭代中丢失它们的值。 / p>
z_m =0.0 #initilise to 0
z_b =0.0 #initilise to 0
def getGradient(m,b,data_sample):
global z_m
global z_b
n_points = data_sample.shape[0] #size of data
m_grad = 0
b_grad = 0
stepper = 0.0001 #this is the learning rate
betha = 0.81
for i in range(n_points):
#Get current pair (x,y)
x = data_sample[i,0]
y = data_sample[i,1]
if(math.isnan(x)|math.isnan(y)): #it will prevent for crashing when some data is missing
#print("is nan")
continue
#you will calculate the partical derivative for each value in data
#Partial derivative respect 'm'
dm = -((2/n_points) * x * (y - (m*x + b)))
#Partial derivative respect 'b'
db = - ((2/n_points) * (y - (m*x + b)))
#Update gradient
m_grad = m_grad + dm
b_grad = b_grad + db
return m_grad,b_grad
def calculateMomentum(m_grad,b_grad,betha=0.81,stepper=0.0001):
global z_m,z_b
#calculate the momentum
z_m = betha*z_m + m_grad
z_b = betha*z_b + b_grad
#Set the new 'better' updated 'm' and 'b'
m_updated = m - stepper*z_m
b_updated = b - stepper*z_b
return m_updated,b_updated
现在正确计算回归线(可能)。对于SGD,最终误差为59706304,动量最终误差为56729062,但可能是在计算梯度时选择随机小批量。
答案 0 :(得分:0)
首先初始化无效,z_m和z_b应初始化为0(因为这是您对渐变的第一次猜测)。其次,在当前的函数形式中,您从未在下一次迭代中“存储”z_m或z_b,因此它们会被重置(无效值为1)