Question

我有一个小的脚本，其数据集xa和ya的成本收敛为零，但是无论我为“迭代”和“ learning_rate”使用什么值，使用数据时我可以获得的最佳成本是31.604设置xb和yb。

我的问题是：费用总是应该趋于零吗？如果是，那么关于数据集xb和yb，我在做什么错了？

import numpy as np


def gradient_descent(x, y):
    m_curr = b_curr = 0
    iterations = 1250
    n = len(x)
    learning_rate = 0.08

    for i in range(iterations):
        y_predicted = (m_curr * x) + b_curr
        cost = (1/n) * sum([val**2 for val in (y - y_predicted)])
        m_der = -(2/n) * sum(x * (y - y_predicted))
        b_der = -(2/n) * sum(y - y_predicted)
        m_curr = m_curr - (learning_rate * m_der)
        b_curr = b_curr - (learning_rate * b_der)
        print('m {}, b {}, cost {}, iteration {}'.format(m_curr, b_curr, cost, i))


xa = np.array([1, 2, 3, 4, 5])
ya = np.array([5, 7, 9, 11, 13])

# xb = np.array([92, 56, 88, 70, 80, 49, 65, 35, 66, 67])
# yb = np.array([98, 68, 81, 80, 83, 52, 66, 30, 68, 73])

gradient_descent(xa, ya)

# gradient_descent(xb, yb)

使用xa和ya（使用如上所示的迭代值和learning_rate）：

m 2.000000000000002, b 2.999999999999995, cost 1.0255191767873153e-29, iteration 1245
m 2.000000000000001, b 2.9999999999999947, cost 1.0255191767873153e-29, iteration 1246
m 2.000000000000002, b 2.999999999999995, cost 1.0255191767873153e-29, iteration 1247
m 2.000000000000001, b 2.9999999999999947, cost 1.0255191767873153e-29, iteration 1248
m 2.000000000000002, b 2.999999999999995, cost 1.0255191767873153e-29, iteration 1249

使用xb和yb（迭代次数= 1000，learning_rate = 0.00001）：

m 1.0445229983270568, b 0.01691112775956422, cost 31.811378572605147, iteration 995
m 1.0445229675787642, b 0.01691330681124408, cost 31.81137809768319, iteration 996
m 1.044522936830507, b 0.016915485860422623, cost 31.811377622762304, iteration 997
m 1.044522906082285, b 0.016917664907099856, cost 31.811377147842503, iteration 998
m 1.0445228753340983, b 0.01691984395127578, cost 31.811376672923775, iteration 999

使用xb和yb（迭代次数= 200000，learning_rate = 0.00021）：

m 1.017952329085966, b 1.8999054866690825, cost 31.604524796644444, iteration 199995
m 1.0179523238769337, b 1.8999058558198456, cost 31.60452479599536, iteration 199996
m 1.0179523186680224, b 1.89990622496171, cost 31.604524795346318, iteration 199997
m 1.017952313459241, b 1.899906594094676, cost 31.60452479469731, iteration 199998
m 1.017952308250581, b 1.8999069632187437, cost 31.604524794048356, iteration 199999

Answer 1

很高兴它能帮助您理解。巩固评论作为对这个问题的答案。

梯度下降函数将总是趋向于向局部/全局最小值移动。这是为了最大程度地减少错误/成本，以便在此处成功使用提供的输入值（X）计算输出（Y）。

您正在求解方程y = mx + b。

在您的xa，ya数据中，它能够精确地计算出误差〜0（或）平衡方程的两边。但是在xb，yb情况下，它只能以〜31的误差解决。

The cost is nothing but the mean error the gradient descent finds while balancing the equation. 
Manually try calculating both sides of the equation, it will become clear.

此外，您还可以使用x值预测y，xa和ya数据的平均误差为0，xb和yb数据的平均误差为31。

梯度下降成本收敛

1 个答案: