我有一个小的脚本,其数据集xa和ya的成本收敛为零,但是无论我为“迭代”和“ learning_rate”使用什么值,使用数据时我可以获得的最佳成本是31.604设置xb和yb。
我的问题是:费用总是应该趋于零吗?如果是,那么关于数据集xb和yb,我在做什么错了?
import numpy as np
def gradient_descent(x, y):
m_curr = b_curr = 0
iterations = 1250
n = len(x)
learning_rate = 0.08
for i in range(iterations):
y_predicted = (m_curr * x) + b_curr
cost = (1/n) * sum([val**2 for val in (y - y_predicted)])
m_der = -(2/n) * sum(x * (y - y_predicted))
b_der = -(2/n) * sum(y - y_predicted)
m_curr = m_curr - (learning_rate * m_der)
b_curr = b_curr - (learning_rate * b_der)
print('m {}, b {}, cost {}, iteration {}'.format(m_curr, b_curr, cost, i))
xa = np.array([1, 2, 3, 4, 5])
ya = np.array([5, 7, 9, 11, 13])
# xb = np.array([92, 56, 88, 70, 80, 49, 65, 35, 66, 67])
# yb = np.array([98, 68, 81, 80, 83, 52, 66, 30, 68, 73])
gradient_descent(xa, ya)
# gradient_descent(xb, yb)
使用xa和ya(使用如上所示的迭代值和learning_rate):
m 2.000000000000002, b 2.999999999999995, cost 1.0255191767873153e-29, iteration 1245
m 2.000000000000001, b 2.9999999999999947, cost 1.0255191767873153e-29, iteration 1246
m 2.000000000000002, b 2.999999999999995, cost 1.0255191767873153e-29, iteration 1247
m 2.000000000000001, b 2.9999999999999947, cost 1.0255191767873153e-29, iteration 1248
m 2.000000000000002, b 2.999999999999995, cost 1.0255191767873153e-29, iteration 1249
使用xb和yb(迭代次数= 1000,learning_rate = 0.00001):
m 1.0445229983270568, b 0.01691112775956422, cost 31.811378572605147, iteration 995
m 1.0445229675787642, b 0.01691330681124408, cost 31.81137809768319, iteration 996
m 1.044522936830507, b 0.016915485860422623, cost 31.811377622762304, iteration 997
m 1.044522906082285, b 0.016917664907099856, cost 31.811377147842503, iteration 998
m 1.0445228753340983, b 0.01691984395127578, cost 31.811376672923775, iteration 999
使用xb和yb(迭代次数= 200000,learning_rate = 0.00021):
m 1.017952329085966, b 1.8999054866690825, cost 31.604524796644444, iteration 199995
m 1.0179523238769337, b 1.8999058558198456, cost 31.60452479599536, iteration 199996
m 1.0179523186680224, b 1.89990622496171, cost 31.604524795346318, iteration 199997
m 1.017952313459241, b 1.899906594094676, cost 31.60452479469731, iteration 199998
m 1.017952308250581, b 1.8999069632187437, cost 31.604524794048356, iteration 199999
答案 0 :(得分:0)
很高兴它能帮助您理解。巩固评论作为对这个问题的答案。
梯度下降函数将总是趋向于向局部/全局最小值移动。这是为了最大程度地减少错误/成本,以便在此处成功使用提供的输入值(X)计算输出(Y)。
您正在求解方程y = mx + b。
在您的xa,ya数据中,它能够精确地计算出误差〜0(或)平衡方程的两边。 但是在xb,yb情况下,它只能以〜31的误差解决。
The cost is nothing but the mean error the gradient descent finds while balancing the equation.
Manually try calculating both sides of the equation, it will become clear.
此外,您还可以使用x值预测y,xa和ya数据的平均误差为0,xb和yb数据的平均误差为31。