给出银行营销数据,我使用决策树和随机森林分类器训练了一个模型,现在试图预测Y的目标变量,但不确定如何执行。是否要在经过训练的模型中加载经过训练的数据和测试数据,并实施?
train_data = pd.read_csv('train_cleaned1.csv')
test_data = pd.read_csv('test_cleaned1.csv')
X = train_data.drop('Final_Y_1', axis=1)
y = train_data.Final_Y_1
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=42)
pipelines = {'rf' : make_pipeline(StandardScaler(),
RandomForestClassifier(random_state=42, class_weight='balanced'))}
rf_hyperparameters = {'randomforestclassifier__n_estimators': [100,
200],
'randomforestclassifier__max_features': ['auto',
'sqrt', 0.33] }
hyperparameters = {'rf' : rf_hyperparameters}
fitted_rf_model = {}
for name, pipeline in pipelines.items():
rf_model = GridSearchCV(pipeline, hyperparameters[name], cv=10,
n_jobs=-1)
rf_model.fit(X_train, y_train)
fitted_rf_model[name] = rf_model
print(name, 'has been fitted.')
for name, model in fitted_rf_model.items():
print(name, model.best_score_ )
我仍然获得不错的成绩,但是我不确定测试数据是否已实现,如果可以,我该怎么做? 射频已安装。 射频0.9004104109304379