Using cross_val_predict against test data set

时间:2017-01-10 02:28:28

标签: python machine-learning scikit-learn data-science

I'm confused about using cross cross_val_predict in a test data set.

I created a simple Random Forest model and used cross_val_predict to make predictions

from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import cross_val_predict, KFold

lr = RandomForestClassifier(random_state=1, class_weight="balanced", n_estimators=25, max_depth=6)
kf = KFold(train_df.shape[0], random_state=1)
predictions = cross_val_predict(lr,train_df[features_columns], train_df["target"], cv=kf)
predictions = pd.Series(predictions)

I'm confused on the next step here, How do I use is learnt above to make predictions on the test data set?

2 个答案:

答案 0 :(得分:2)

正如@DmitryPolonskiy评论的那样,模型必须经过训练(使用fit方法)才能用于predict

# Train the model (a.k.a. `fit` training data to it).
lr.fit(train_df[features_columns], train_df["target"])
# Use the model to make predictions based on testing data.
y_pred = lr.predict(test_df[feature_columns])
# Compare the predicted y values to actual y values.
accuracy = (y_pred == test_df["target"]).mean()

cross_val_predict是一种交叉验证方法,可让您确定模型的准确性。看看sklearn's cross-validation page

答案 1 :(得分:2)

在预测之前,我认为cross_val_scorecross_val_predict不适合。它在飞行中完成。如果你看一下documentation (section 3.1.1.1),你会发现他们从来没有在任何地方提到合适。