如何使用套索和RobustScalar建立预测函数?

时间:2019-08-14 15:28:22

标签: python machine-learning scikit-learn lasso-regression

我试图弄清楚如何使用LASSO回归来预测值,而无需使用Sklearn提供的.predict函数。这基本上只是为了拓宽我对LASSO内部工作方式的理解。我在Cross Validated上询问了有关LASSO回归如何工作的问题,其中一条评论提到了预测函数与线性回归的工作原理相同。因此,我想尝试做自己的功能来做到这一点。

我能够在更简单的示例中成功地重新创建预测函数,但是当我尝试将其与RobustScaler结合使用时,我会不断获得不同的输出。在此示例中,我使用Sklearn获得的预测为4.33,使用自己的函数获得的预测为6.18。我在这里想念什么?最后我是否会逆变换正确的预测?

import pandas as pd
from sklearn.preprocessing import RobustScaler
from sklearn.linear_model import Lasso
import numpy as np

df = pd.DataFrame({'Y':[5, -10, 10, .5, 2.5, 15], 'X1':[1., -2.,  2., .1, .5, 3], 'X2':[1, 1, 2, 1, 1, 1], 
              'X3':[6, 6, 6, 5, 6, 4], 'X4':[6, 5, 4, 3, 2, 1]})

X = df[['X1','X2','X3','X4']]
y = df[['Y']]

#Scaling 
transformer_x = RobustScaler().fit(X)
transformer_y = RobustScaler().fit(y) 
X_scal = transformer_x.transform(X)
y_scal = transformer_y.transform(y)

#LASSO
lasso = Lasso()
lasso = lasso.fit(X_scal, y_scal)

#LASSO info
print('Score: ', lasso.score(X_scal,y_scal))
print('Raw Intercept: ', lasso.intercept_.round(2)[0]) 
intercept = transformer_y.inverse_transform([lasso.intercept_])[0][0]
print('Unscaled Intercept: ', intercept) 
print('\nCoefficients Used: ')
coeff_array = lasso.coef_
inverse_coeff_array = transformer_x.inverse_transform(lasso.coef_.reshape(1,-1))[0]
for i,j,k in zip(X.columns, coeff_array, inverse_coeff_array):
    if j != 0:
        print(i, j.round(2), k.round(2))

#Predictions
example = [[3,1,1,1]]
pred = lasso.predict(example)
pred_scal = transformer_y.inverse_transform(pred.reshape(-1, 1))
print('\nRaw Prediction where X1 = 3: ', pred[0])
print('Unscaled Prediction where X1 = 3: ', pred_scal[0][0])

#Predictions without using the .predict function 
def lasso_predict_value_(X1,X2,X3,X4): 
    print('intercept: ', intercept)
    print('coef: ', inverse_coeff_array[0])
    print('X1: ', X1)
    preds = intercept + inverse_coeff_array[0]*X1
    print('Your predicted value is: ', preds)

lasso_predict_value_(3,1,1,1)

1 个答案:

答案 0 :(得分:2)

受过训练的Lasso不知道是否调用了给定的数据点。因此,您进行预测的手动方法不应采用缩放的方面。

如果删除您对模型系数的处理,我们可以得到sklearn模型的结果


example = [[3,1,1,1]]
lasso.predict(example)

# array([0.07533937])


#Predictions without using the .predict function 
def lasso_predict_value_(X1,X2,X3,X4): 
    x_test = np.array([X1,X2, X3, X4])
    preds = lasso.intercept_ + sum(x_test*lasso.coef_)
    print('Your predicted value is: ', preds)


lasso_predict_value_(3,1,1,1)

# Your predicted value is:  [0.07533937]

更新2:

  

一旦我使用LASSO,我就需要查看他们的预测中的内容   原始单位。我的因变量是美元金额,如果我   不要逆变换回去,我看不到我有多少美元   需要预测。

这是一个非常有效的方案。您需要应用transformer_y.inverse_transform才能获得未用的美元金额值。无需干扰模型权重。

更新示例

example = [[3,1,1,1]]
scaled_pred = lasso.predict(transformer_x.transform(example))
transformer_y.inverse_transform([scaled_pred])
# array([[4.07460407]])

#Predictions without using the .predict function 
def lasso_predict_value_(X1,X2,X3,X4): 
    x_test = transformer_x.transform(np.array([X1,X2, X3, X4]).reshape(1,-1))[0]
    preds = lasso.intercept_ + sum(x_test*lasso.coef_)
    print('Your predicted value is: ', preds)
    print('Your unscaled predicted value is: ', 
          transformer_y.inverse_transform([scaled_pred]))


lasso_predict_value_(3,1,1,1)
# Your predicted value is:  [0.0418844]    
# Your unscaled predicted value is:  [[4.07460407]]