线性回归上的Python,GD和SGD实现

时间:2018-02-17 17:00:47

标签: python machine-learning gradient gradient-descent

我尝试在一个简单的线性回归示例中理解并实现这些算法。我很清楚,完整的批量梯度下降使用所有数据来计算梯度,随机梯度下降只使用一个。

全批次渐变下降:

import pandas as pd
from math import sqrt

df = pd.read_csv("data.csv")
df = df.sample(frac=1)
X = df['X'].values
y = df['y'].values

m_current=0
b_current=0

epochs=100000
learning_rate=0.0001
N = float(len(y))

for i in range(epochs):
    y_current = (m_current * X) + b_current
    cost = sum([data**2 for data in (y-y_current)]) / N
    rmse = sqrt(cost)

    m_gradient = -(2/N) * sum(X * (y - y_current))
    b_gradient = -(2/N) * sum(y - y_current)

    m_current = m_current - (learning_rate * m_gradient)
    b_current = b_current - (learning_rate * b_gradient)

print("RMSE: ", rmse)

Full Batch Gradient Descent output RMSE: 10.597894381512043

现在我尝试在此代码上实现Stochastic Gradient Descent,它看起来像这样:

import pandas as pd
from math import sqrt

df = pd.read_csv("data.csv")
df = df.sample(frac=1)
X = df['X'].values
y = df['y'].values

m_current=0
b_current=0

epochs=100000
learning_rate=0.0001
N = float(len(y))

mini = df.sample(n=1) # get one random row from dataset

X_mini = mini['X'].values
y_mini = mini['y'].values

for i in range(epochs):
    y_current = (m_current * X) + b_current
    cost = sum([data**2 for data in (y-y_current)]) / N
    rmse = sqrt(cost)

    m_gradient = -(2/N) * (X_mini * (y_mini - y_current))
    b_gradient = -(2/N) * (y_mini - y_current)

    m_current = m_current - (learning_rate * m_gradient)
    b_current = b_current - (learning_rate * b_gradient)

print("RMSE: ", rmse)

输出:RMSE: 27.941268469783633RMSE: 20.919246260939282RMSE: 31.100985268167648RMSE: 21.023479528518386RMSE: 19.920972478204785 ......

我使用sklearn SGDRegressor获得的结果(使用相同的设置):

import pandas as pd
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
from math import sqrt

    data= pd.read_csv('data.csv')

    x = data.X.values.reshape(-1,1)
    y = data.y.values.reshape(-1,1).ravel()

    Model = linear_model.SGDRegressor(alpha = 0.0001, shuffle=True, max_iter = 100000)
    Model.fit(x,y)
    y_predicted = Model.predict(x)

    mse = mean_squared_error(y, y_predicted)
    print("RMSE: ", sqrt(mse))

Otuputs:RMSE: 10.995881334048224RMSE: 11.75907544873036RMSE: 12.981134247509486RMSE: 12.298263437187988RMSE: 12.549948073154608 ......

上述算法得到的结果比scikit模型的结果更糟糕。我想知道我在哪里弄错了?我的算法也很慢(几秒钟)..

1 个答案:

答案 0 :(得分:0)

您似乎在alpha中将SGDClassifier设置为学习率。 alpha不是学习率。

将常量学习率设置为SGDClassifier's learing_rateconstanteta0至您的学习率。

您还需要将alpha设置为0,因为这是正则化术语,您的实现不会使用它。

另请注意,由于这些算法本质上是随机的,因此将random_state设置为某个固定值可能是一个好主意。