为什么我的SGD远远不如我的线性回归模型?

时间:2015-07-14 15:32:32

标签: python numpy scikit-learn linear-regression

我正在尝试将线性回归(Normal Equation)与SGD进行比较,但看起来SGD远远不够。我做错了吗?

这是我的代码

x = np.random.randint(100, size=1000)
y = x * 0.10
slope, intercept, r_value, p_value, std_err = stats.linregress(x=x, y=y)
print("slope is %f and intercept is %s" % (slope,intercept))
#slope is 0.100000 and intercept is 1.61435309565e-11

这是我的SGD

x = x.reshape(1000,1)
clf = linear_model.SGDRegressor()
clf.fit(x, y, coef_init=0, intercept_init=0)

print(clf.intercept_)
print(clf.coef_)

#[  1.46746270e+10]
#[  3.14999003e+10]

我原以为coefintercept几乎相同,因为数据是线性的。

1 个答案:

答案 0 :(得分:1)

当我尝试运行此代码时,出现溢出错误。我怀疑你有同样的问题,但出于某种原因,它并没有抛出错误。

如果缩小功能,一切都按预期工作。使用scipy.stats.linregress

>>> x = np.random.random(1000) * 10
>>> y = x * 0.10
>>> slope, intercept, r_value, p_value, std_err = stats.linregress(x=x, y=y)
>>> print("slope is %f and intercept is %s" % (slope,intercept))
slope is 0.100000 and intercept is -2.22044604925e-15

使用linear_model.SGDRegressor

>>> clf.fit(x[:,None], y)
SGDRegressor(alpha=0.0001, epsilon=0.1, eta0=0.01, fit_intercept=True,
       l1_ratio=0.15, learning_rate='invscaling', loss='squared_loss',
       n_iter=5, penalty='l2', power_t=0.25, random_state=None,
       shuffle=False, verbose=0, warm_start=False)
>>> print("slope is %f and intercept is %s" % (clf.coef_, clf.intercept_[0]))
slope is 0.099763 and intercept is 0.00163353754797

slope的值略低,但我猜是因为正规化。