如何使用sklearn LassoCV,我在做什么错?

时间:2018-09-21 16:56:42

标签: python scikit-learn linear-regression lasso

对于成功使用sklearn.linear_model.LassoCV的人来说,这是一个非常简单的问题。

我正在对一个非常简单的模拟数据集进行第一个Lasso回归,如下所示...我的结果令人不满意,我想知道我做错了什么。

import numpy as np
import pandas as pd

X = np.random.uniform(0, 10, 100)
e = np.random.uniform(0, 1, 100)
# coefficients
b0, b1, b2, b3 = 0.0, 0.1, 0.2, 0.3
# target
Y = b0 + b1 * X + b2 * X**2 + b3 * X**3 + e
# the dataset x**1, ..., x**10 
data = pd.DataFrame({"Y":Y, "X1":X})
for i in range(2, 11): data["X{:d}".format(i)] = data["X1"]**i

X = data.drop(axis = 1, labels = 'Y')
Y = data['Y']

from sklearn.preprocessing import StandardScaler

# standardize the data
scaler = StandardScaler()
Xscl = pd.DataFrame( data = scaler.fit_transform(X), columns = ['X'+`i` for i in range(1, 11)] )

# lasso constraints
alphas = np.logspace(-3, 1, 1000)
# perform regression with 10 fold cv
model = LassoCV(alphas = alphas, cv = 10, max_iter=10000, tol=0.0001, eps = 0.0001)
result = model.fit(Xscl, Y)

# reverse scale coefficients and plot fit over data
coeff = model.coef_/scaler.scale_
x1 = np.linspace(0., 10., 100)
pp.plot(X['X1'], Y, 'o')
pp.plot(x1, np.polyval(coeff[::-1], x1), '--')

# print(model.coef_)
# [-6.3122168  
# 38.18296697 
# 30.20713128 
# 16.3567352   
# 7.30950212  
# 2.27074138
# 0.         
# 0.         
# -1.16784659 
# -1.88575215]

对数据进行拟合可以得出以下结论。我在做什么错了?

enter image description here

0 个答案:

没有答案