对于成功使用sklearn.linear_model.LassoCV
的人来说,这是一个非常简单的问题。
我正在对一个非常简单的模拟数据集进行第一个Lasso回归,如下所示...我的结果令人不满意,我想知道我做错了什么。
import numpy as np
import pandas as pd
X = np.random.uniform(0, 10, 100)
e = np.random.uniform(0, 1, 100)
# coefficients
b0, b1, b2, b3 = 0.0, 0.1, 0.2, 0.3
# target
Y = b0 + b1 * X + b2 * X**2 + b3 * X**3 + e
# the dataset x**1, ..., x**10
data = pd.DataFrame({"Y":Y, "X1":X})
for i in range(2, 11): data["X{:d}".format(i)] = data["X1"]**i
X = data.drop(axis = 1, labels = 'Y')
Y = data['Y']
from sklearn.preprocessing import StandardScaler
# standardize the data
scaler = StandardScaler()
Xscl = pd.DataFrame( data = scaler.fit_transform(X), columns = ['X'+`i` for i in range(1, 11)] )
# lasso constraints
alphas = np.logspace(-3, 1, 1000)
# perform regression with 10 fold cv
model = LassoCV(alphas = alphas, cv = 10, max_iter=10000, tol=0.0001, eps = 0.0001)
result = model.fit(Xscl, Y)
# reverse scale coefficients and plot fit over data
coeff = model.coef_/scaler.scale_
x1 = np.linspace(0., 10., 100)
pp.plot(X['X1'], Y, 'o')
pp.plot(x1, np.polyval(coeff[::-1], x1), '--')
# print(model.coef_)
# [-6.3122168
# 38.18296697
# 30.20713128
# 16.3567352
# 7.30950212
# 2.27074138
# 0.
# 0.
# -1.16784659
# -1.88575215]
对数据进行拟合可以得出以下结论。我在做什么错了?