对于较小的数据集而言,梯度下降无法收敛到非常大的值

时间:2019-04-11 00:33:37

标签: python machine-learning linear-regression data-science

我正在尝试编写一个程序来计算线性回归模型的斜率和截距,但是当我运行10次以上的迭代时,梯度下降函数会给出截距和截距的np.nan值坡度。

下面是我的实现

def get_gradient_at_b(x, y, b, m):
  N = len(x)
  diff = 0
  for i in range(N):
    x_val = x[i]
    y_val = y[i]
    diff += (y_val - ((m * x_val) + b))
  b_gradient = -(2/N) * diff  
  return b_gradient

def get_gradient_at_m(x, y, b, m):
  N = len(x)
  diff = 0
  for i in range(N):
      x_val = x[i]
      y_val = y[i]
      diff += x_val * (y_val - ((m * x_val) + b))
  m_gradient = -(2/N) * diff  
  return m_gradient

def step_gradient(b_current, m_current, x, y, learning_rate):
    b_gradient = get_gradient_at_b(x, y, b_current, m_current)
    m_gradient = get_gradient_at_m(x, y, b_current, m_current)
    b = b_current - (learning_rate * b_gradient)
    m = m_current - (learning_rate * m_gradient)
    return [b, m]

def gradient_descent(x, y, learning_rate, num_iterations):
  b = 0
  m = 0
  for i in range(num_iterations):
    b, m = step_gradient(b, m, x, y, learning_rate)
  return [b,m]  

我正在以下数据上运行它:

a=[3.87656018e+11, 4.10320300e+11, 4.15730874e+11, 4.52699998e+11,
       4.62146799e+11, 4.78965491e+11, 5.08068952e+11, 5.99592902e+11,
       6.99688853e+11, 8.08901077e+11, 9.20316530e+11, 1.20111177e+12,
       1.18695276e+12, 1.32394030e+12, 1.65661707e+12, 1.82304993e+12,
       1.82763786e+12, 1.85672212e+12, 2.03912745e+12, 2.10239081e+12,
       2.27422971e+12, 2.60081824e+12]
b=[3.3469950e+10, 3.4784980e+10, 3.3218720e+10, 3.6822490e+10,
       4.4560290e+10, 4.3826720e+10, 5.2719430e+10, 6.3842550e+10,
       8.3535940e+10, 1.0309053e+11, 1.2641405e+11, 1.6313218e+11,
       1.8529536e+11, 1.7875143e+11, 2.4981555e+11, 3.0596392e+11,
       3.0040058e+11, 3.1440530e+11, 3.1033848e+11, 2.6229109e+11,
       2.7585243e+11, 3.0352616e+11]

print(gradient_descent(a, b, 0.01, 100))
#result --> [nan, nan]

当我在具有较小值的数据集上运行gradient_descent函数时,它将给出正确的答案。我还可以通过sklearn.linear_model import LinearRegression

获得上述数据的截距和斜率

在弄清楚结果为何是[nan, nan]而不是给我正确的截距和斜率时,将不胜感激。

1 个答案:

答案 0 :(得分:1)

您需要降低学习速度。由于ab中的值太大(> = 1e11),因此学习率大约需要1e-25才能进行梯度下降,否则将由于ab的较大梯度而随机超调。

b, m = gradient_descent(a, b, 5e-25, 100)
print(b, m)
Out: -3.7387067636195266e-13 0.13854551291084335