使用我在原始 df 中找到的数据,我通过使用 sklearn 的线性回归得到了一些不错的结果 但是我读到,当值非常不同时,最好进行标准化或标准化。我使用 StandardScaler 这样做,但结果不合逻辑。 请问你能找到我的错误在哪里吗??
import pandas as pd
insurance = pd.read_csv("./insurance2.csv")
insurance.head()
age sex bmi children smoker charges
0 19 1 27.900 0 1 16884.92400
1 18 0 33.770 1 0 1725.55230
2 28 0 33.000 3 0 4449.46200
3 33 0 22.705 0 0 21984.47061
4 32 0 28.880 0 0 3866.85520
X = insurance.drop(['charges'], axis = 1)
y = insurance.charges
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)
数据的线性回归
from sklearn.linear_model import LinearRegression
MLR = LinearRegression(fit_intercept=False)
MLR.fit(X_train, y_train)
y_pred_MLR = MLR.predict(X_test)
results = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred_MLR})
results
Actual Predicted
1247 1633.9618 9692.885550
609 8547.6913 14503.324553
393 9290.1395 15207.774334
503 32548.3405 8435.748430
198 9644.2525 13019.558091
... ... ...
823 12523.6048 15787.241603
969 10702.6424 16050.896025
1326 9377.9047 15409.298487
792 21195.8180 13856.597356
634 14410.9321 18706.442066
from sklearn.metrics import r2_score
accuracy_MLR = r2_score(y_test, y_pred_MLR)
print("Accuracy con MLR: ", accuracy_MLR)
Accuracy con MLR: 0.13064545399288352
from math import sqrt
from sklearn.metrics import mean_squared_error
RMSE_MLR = sqrt(mean_squared_error(y_test, y_pred_MLR))
print("RMSE for Testing Data: ", RMSE_MLR)
RMSE for Testing Data: 11776.684328389369
数据标准化的LR
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X2_train = sc.fit_transform(X_train)
X2_test = sc.transform(X_test)
MLR2 = LinearRegression(fit_intercept=False)
MLR2.fit(X2_train, y_train)
y_pred_MLR2 = MLR2.predict(X2_test)
results2 = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred_MLR2})
results2
Actual Predicted
1247 1633.9618 -7937.729692
609 8547.6913 -3397.232400
393 9290.1395 -2282.260721
503 32548.3405 13500.743967
198 9644.2525 -6194.481022
from sklearn.metrics import r2_score
accuracy_MLR2 = r2_score(y_test, y_pred_MLR2)
print("Accuracy con MLR: ", accuracy_MLR2)
Accuracy con MLR: -1.0909061875014818
from math import sqrt
from sklearn.metrics import mean_squared_error
RMSE_MLR2 = sqrt(mean_squared_error(y_test, y_pred_MLR2))
print("RMSE for Testing Data: ", RMSE_MLR2)
RMSE for Testing Data: 18263.829458461096