我有一个回归问题,我尝试了正则回归和随机森林。 这是我的代码
from sklearn import linear_model
clf = linear_model.LinearRegression()
clf.fit(X, y)
print('Raw Coefficients: \n', clf.coef_)
print('Score: \n',clf.score(X, y));
# Now we normalise the data
scalerX = StandardScaler().fit(X)
scalery = StandardScaler().fit(y) # Have to reshape to avoid warnings
normed_X = scalerX.transform(X)
normed_y = scalery.transform(y) # Have to reshape to avoid warnings
scaledclf = linear_model.LinearRegression()
scaledclf.fit(normed_X, y)
print('Scaled Coefficients: \n', scaledclf.coef_)
print('Score: \n', scaledclf.score(normed_X, y));
from sklearn.ensemble import RandomForestRegressor
regr = RandomForestRegressor()
regr.fit(X, y)
print('Feature importance: \n')
print(regr.feature_importances_)
print('Forest Score: \n', regr.score(X, y));
从输出中,我可以看到线性回归(标度或原始)的R2得分为0.27。但是随机森林给了我0.9。 (根据文档,最准确的是1。)
在这种情况下,我能说我的数据比线性回归更适合随机森林回归吗? (我认为这意味着我的数据不是线性的,并且可以通过非线性模型更好地拟合)
Raw Coefficients:
[[-2.46128236 6.50261042 4.23066487 0.16846074 -0.42161622 0.52332136]
[ 8.7998738 19.23413227 58.76010742 1.02298612 -3.28209941 -2.99637104]]
Score:
0.27018867990736034
Scaled Coefficients:
[[-0.83667512 2.32189634 2.03873375 1.47020538 -1.27457093 0.38564757]
[ 2.99138188 6.86795895 28.31616729 8.92789452 -9.92198189 -2.20809484]]
Score:
0.27018867990736056
Feature importance:
[0.04388598 0.0329269 0.18755359 0.35849597 0.31578241 0.06135516]
Forest Score:
0.9003045525566503