Question

我有一个回归问题，我尝试了正则回归和随机森林。这是我的代码

 from sklearn import linear_model
clf = linear_model.LinearRegression()

clf.fit(X, y)

print('Raw Coefficients: \n', clf.coef_)
print('Score: \n',clf.score(X, y));


# Now we normalise the data
scalerX = StandardScaler().fit(X)
scalery = StandardScaler().fit(y) # Have to reshape to avoid warnings

normed_X = scalerX.transform(X)
normed_y = scalery.transform(y) # Have to reshape to avoid warnings
scaledclf = linear_model.LinearRegression()

scaledclf.fit(normed_X, y)
print('Scaled Coefficients: \n', scaledclf.coef_)
print('Score: \n', scaledclf.score(normed_X, y));

from sklearn.ensemble import RandomForestRegressor
regr = RandomForestRegressor()
regr.fit(X, y)
print('Feature importance: \n')
print(regr.feature_importances_)
print('Forest Score: \n', regr.score(X, y));

从输出中，我可以看到线性回归（标度或原始）的R2得分为0.27。但是随机森林给了我0.9。（根据文档，最准确的是1。）

在这种情况下，我能说我的数据比线性回归更适合随机森林回归吗？（我认为这意味着我的数据不是线性的，并且可以通过非线性模型更好地拟合）

Raw Coefficients:
 [[-2.46128236  6.50261042  4.23066487  0.16846074 -0.42161622  0.52332136]
 [ 8.7998738  19.23413227 58.76010742  1.02298612 -3.28209941 -2.99637104]]
Score:
 0.27018867990736034
Scaled Coefficients:
 [[-0.83667512  2.32189634  2.03873375  1.47020538 -1.27457093  0.38564757]
 [ 2.99138188  6.86795895 28.31616729  8.92789452 -9.92198189 -2.20809484]]
Score:
 0.27018867990736056
Feature importance:

[0.04388598 0.0329269  0.18755359 0.35849597 0.31578241 0.06135516]
Forest Score:
 0.9003045525566503

我可以说随机森林比线性回归更适合吗？

0 个答案: