Question

我提高和降低了学习率，但似乎并没有收敛或永远坚持下去。如果我将学习率设置为0.0004，它会慢慢尝试收敛，但是需要进行如此多次的迭代，因此我不得不设置超过100万次迭代，并且只能将误差从93的最小二乘变为58

我正在关注Andrews NG论坛

带有渐变线的图的图像：

image of the graph with the gradient line

我的代码：

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib.patches as mpatches
import time


data = pd.read_csv('weight-height.csv')
x = np.array(data['Height'])
y = np.array(data['Weight'])


plt.scatter(x, y, c='blue')
plt.suptitle('Male')
plt.xlabel('Height')
plt.ylabel('Weight')
total = mpatches.Patch(color='blue', label='Total amount of data {}'.format(len(x)))
plt.legend(handles=[total])

theta0 = 0
theta1 = 0
learning_rate = 0.0004
epochs = 10000


# gradient = theta0 + theta1*X


def hypothesis(x):
    return theta0 + theta1 * x


def cost_function(x):
    return 1 / (2 * len(x)) * sum((hypothesis(x) - y) ** 2)

start = time.time()

for i in range(epochs):
    print(f'{i}/ {epochs}')
    theta0 = theta0 - learning_rate * 1/len(x) * sum (hypothesis(x) - y)
    theta1 = theta1 - learning_rate * 1/len(x) * sum((hypothesis(x) - y) * x)
    print('\ncost: {}\ntheta0: {},\ntheta1: {}'.format(cost_function(x), theta0, theta1))

end = time.time()

plt.plot(x, hypothesis(x), c= 'red')


print('\ncost: {}\ntheta0: {},\ntheta1: {}'.format(cost_function(x), theta0, theta1))

print('time finished at {} seconds'.format(end - start))

plt.show()

Answer 1

您的问题可能是您正在一步一步地更新theta0和theta1：

theta0 = theta0 - learning_rate * 1/len(x) * sum (hypothesis(x) - y)
# the update to theta1 is now using the updated version of theta0
theta1 = theta1 - learning_rate * 1/len(x) * sum((hypothesis(x) - y) * x)

最好重写一次，以便一次调用“假设”函数，然后将要使用的theta0和theta1值显式传递给它，而不是使用全局值。

# modify to explicitly pass theta0/1
def hypothesis(x, theta0, theta1):
    return theta0 + theta1 * x

# explicitly pass y
def cost_function(x, y, theta0, theta1):
    return 1 / (2 * len(x)) * sum((hypothesis(x, theta0, theta1) - y) ** 2)

for i in range(epochs):
    print(f'{i}/ {epochs}')
    # calculate hypothesis once
    delta = hypothesis(x, theta0, theta1)
    theta0 = theta0 - learning_rate * 1/len(x) * sum (delta - y)
    theta1 = theta1 - learning_rate * 1/len(x) * sum((delta - y) * x)
    print('\ncost: {}\ntheta0: {},\ntheta1: {}'.format(cost_function(x, y, theta0, theta1))

Answer 2

回想一下，我设法通过使用特征缩放来解决此问题，并且对其进行归一化以使其快速收敛，而不是使用真实值。

graph

python批量梯度下降不收敛

2 个答案: