Question

使用我在原始 df 中找到的数据，我通过使用 sklearn 的线性回归得到了一些不错的结果但是我读到，当值非常不同时，最好进行标准化或标准化。我使用 StandardScaler 这样做，但结果不合逻辑。请问你能找到我的错误在哪里吗？？

import pandas as pd
insurance = pd.read_csv("./insurance2.csv")
insurance.head()


    age sex bmi children    smoker  charges
0   19  1   27.900  0   1   16884.92400
1   18  0   33.770  1   0   1725.55230
2   28  0   33.000  3   0   4449.46200
3   33  0   22.705  0   0   21984.47061
4   32  0   28.880  0   0   3866.85520

X = insurance.drop(['charges'], axis = 1)   
y = insurance.charges    

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)

数据的线性回归

from sklearn.linear_model import LinearRegression  
MLR = LinearRegression(fit_intercept=False)  
MLR.fit(X_train, y_train) 
y_pred_MLR = MLR.predict(X_test)

results = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred_MLR})
results



     Actual     Predicted
1247 1633.9618  9692.885550
609  8547.6913  14503.324553
393  9290.1395  15207.774334
503  32548.3405 8435.748430
198  9644.2525  13019.558091
... ... ...
823  12523.6048 15787.241603
969  10702.6424 16050.896025
1326 9377.9047  15409.298487
792  21195.8180 13856.597356
634  14410.9321 18706.442066

from sklearn.metrics import r2_score
accuracy_MLR = r2_score(y_test, y_pred_MLR)
print("Accuracy con MLR: ", accuracy_MLR)
Accuracy con MLR:  0.13064545399288352

from math import sqrt
from sklearn.metrics import mean_squared_error

RMSE_MLR = sqrt(mean_squared_error(y_test, y_pred_MLR))
print("RMSE for Testing Data: ", RMSE_MLR)
RMSE for Testing Data:  11776.684328389369

数据标准化的LR

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X2_train = sc.fit_transform(X_train)
X2_test = sc.transform(X_test)

MLR2 = LinearRegression(fit_intercept=False)  
MLR2.fit(X2_train, y_train)  

y_pred_MLR2 = MLR2.predict(X2_test)

results2 = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred_MLR2})
results2

      Actual    Predicted
1247 1633.9618  -7937.729692
609  8547.6913  -3397.232400
393  9290.1395  -2282.260721
503  32548.3405 13500.743967
198  9644.2525  -6194.481022

from sklearn.metrics import r2_score

accuracy_MLR2 = r2_score(y_test, y_pred_MLR2)
print("Accuracy con MLR: ", accuracy_MLR2)
Accuracy con MLR:  -1.0909061875014818

from math import sqrt
from sklearn.metrics import mean_squared_error

RMSE_MLR2 = sqrt(mean_squared_error(y_test, y_pred_MLR2))
print("RMSE for Testing Data: ", RMSE_MLR2)
RMSE for Testing Data:  18263.829458461096

python线性回归模型的标准化问题

0 个答案: