我已经通过sklearn获得了我的randomforest回归模型。我试图通过以下自己的编码来检查具有不同超参数的验证数据集中的R2。我在编码中使用了KFold。得分约为0.95。 但是,当我使用来自sklearn的validation_curve时,R2分数约为0.6。
from sklearn.metrics import r2_score
c=[]
d=[]
for j in n_estimator_range:
a = []
b=[]
for i,(train_indx,val_index) in enumerate(kflod.split(X_train_3,Y_train)):
x_s = X_train_3.iloc[train_indx]
y_s = Y_train.iloc[train_indx]
x_v = X_train_3.iloc[val_index]
y_v = Y_train.iloc[val_index]
rfModel = RandomForestRegressor(n_estimators=j,oob_score=True)
rfModel.fit(x_s,y_s)
y_pred = rfModel.predict(x_v)
a.append(rmsle(y_v,y_pred))
b.append(r2_score(y_v,y_pred))
c.append(np.mean(a))
print(np.mean(a))
d.append(np.mean(b))
print(np.mean(b))
得分都在0.95左右
from sklearn.model_selection import validation_curve
# Calculate accuracy on training and test set using range of parameter values
train_scores, test_scores = validation_curve(estimator=RandomForestRegressor(),
X=X_train_3,
y=Y_train,
param_name="n_estimators",
param_range=n_estimator_range,
cv=3,
scoring='r2',
)
R2分数约为0.6