我正在使用scikit-learn运行一堆模型来解决分类问题。
这是所有运行中应该执行的代码:
for model_name, classifier, param_grid, cv, cv_name in tqdm(zip(model_names, classifiers, param_grids, cvs, cv_names)):
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier)])
train_and_score_model(model_name, pipeline, param_grid, cv=cv)
我的问题是,如何保留train_and_score_model
函数的输出?它返回一个简历对象,即模型。
我试图做的是创建一个列表cv_names = ['dm_cv', 'lr_cv', 'knn_cv', 'svm_cv', 'dt_cv', 'rf_cv', 'nb_cv']
并将其设置为for循环运行,但是我认为这是不对的。那就是for循环头中的cv_name
迭代器。
我认为这是不对的,因为我不是要设置字符串而不是变量吗?像这样,我真正应该拥有的是cv_names = [dm_cv, lr_cv, knn_cv, svm_cv, dt_cv, rf_cv, nb_cv]
,但我认为我不能拥有这样的列表。
我想到的另一种方法是将每个模型保存在字典中,其中键将是我上面概述的列表的元素。我不知道我是否可以将模型作为字典值。
这是我目前运行的笨拙的重复代码,可以在for循环中执行我想要的操作:
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier_dm)])
dm_cv = train_and_score_model('Dummy Model', pipeline, param_grid_dm)
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier_lr)])
lr_cv = train_and_score_model('Logistic Regression', pipeline, param_grid_lr)
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier_knn)])
knn_cv = train_and_score_model('K Nearest Neighbors', pipeline, param_grid_knn)
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier_svm)])
svm_cv = train_and_score_model('Support Vector Machine', pipeline, param_grid_svm)
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier_dt)])
dt_cv = train_and_score_model('Decision Tree', pipeline, param_grid_dt)
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier_rf)])
rf_cv = train_and_score_model('Random Forest', pipeline, param_grid_rf)
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier_nb)])
nb_cv = train_and_score_model('Naive Bayes', pipeline, param_grid_nb)
答案 0 :(得分:1)
您可以使用分类器名称的映射创建字典,其中 它们的信息,即对象和参数网格:
models_list = {'Logistic Regression': (classifier_lr, param_grid_lr),
'K Nearest Neighbours': (classifier_knn, param_grid_knn)}
遍历字典中的每个键值对并构建管道:
model_cvs = {}
for model_name, model_info in models_list.items():
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', model_info[0])])
model_cvs[model_name] = train_and_score_model(model_name, pipeline, model_info[1])