并行代码似乎在不同的内核上同时运行相同的输入

时间:2019-05-23 11:46:04

标签: python-3.x multiprocessing pool

我对使用Python进行并行处理非常陌生。我设法并行运行我的代码,但是,我仍然怀疑我是否以最有效的方式做到了这一点。首先,我将数据拆分如下:

gbm_param_combs = get_cartesian_prod(gbm_params)
random.Random(23).shuffle(gbm_param_combs)
gbm_param_combs = gbm_param_combs[501:506]

for counter, param in enumerate(gbm_param_combs):
    param['counter'] = int(counter)

gbm_df0 = np.array_split(gbm_param_combs,5)[0]
gbm_df1 = np.array_split(gbm_param_combs,5)[1]
gbm_df2 = np.array_split(gbm_param_combs,5)[2]
gbm_df3 = np.array_split(gbm_param_combs,5)[3]
gbm_df4 = np.array_split(gbm_param_combs,5)[4]

然后,我创建了一个具有多个输入的函数,以同时调用该函数。在该函数中,我拟合了模型并计算了错误分数。

def finalFold(params, feats_final, target, h2o_train, h2o_test, fold, pd_scores_final):

    h2o_train = h2o.H2OFrame(h2o_train)
    h2o_test = h2o.H2OFrame(h2o_test)

    scores = []
    random_state = 123

        for param in params:

            counter = param.get('counter')
            param = {k:v for k, v in param.items() if k not in ('counter')}

            print('parameter combination: ', param)
            print('COUNTER: ', counter)


            #define model and fit
            gbm = H2OGradientBoostingEstimator(stopping_rounds = 5,
                                                   stopping_metric = 'rmse',
                                                   stopping_tolerance = 1e-4,
                                                   seed = random_state,
                                                   **param)

            print('GBM TRAINING STARTS....')
            gbm.train(x = feats_final,
                          y = target,
                          training_frame = h2o_train)


            score = gbm.model_performance(h2o_test).r2()

            pd_scores_final = pd_scores_final.append({'fold': int(fold),
                                                      'score': score,
                                                      'corr' : 0.0,
                                                      'param_idx': int(counter)},
                                                     ignore_index=True)
    return pd_scores_final

最后,我使用starmap调用该函数,如下所示:

p = mp.Pool(processes=5)

.....

for fold, (train_index, test_index) in enumerate(kfolds.split(pd_data)):

    .....
    argsGBM = [(gbm_df0, feats_final, target, h2o_train.as_data_frame(), h2o_test.as_data_frame(), fold, pd_scores_final_GBM), 
            (gbm_df1, feats_final, target, h2o_train.as_data_frame(), h2o_test.as_data_frame(), fold, pd_scores_final_GBM), 
            (gbm_df2, feats_final, target, h2o_train.as_data_frame(), h2o_test.as_data_frame(), fold, pd_scores_final_GBM),
            (gbm_df3, feats_final, target, h2o_train.as_data_frame(), h2o_test.as_data_frame(), fold, pd_scores_final_GBM),
            (gbm_df4, feats_final, target, h2o_train.as_data_frame(), h2o_test.as_data_frame(), fold, pd_scores_final_GBM)]

    pool_results3 = p.starmap(finalFold, argsGBM)

    for k in range(0,len(pool_results3)):
        if k ==0:
            pd_scores_final_GBM = pd.DataFrame(pool_results3[k])
        else:
            pd_scores_final_GBM = pd.concat([pd_scores_final_GBM,pd.DataFrame(pool_results3[k])], axis=0, ignore_index=True)

但是,我看到的是pool_results3各个部分的结果是相同的。那就是:

enter image description here

enter image description here

代码有什么问题?

0 个答案:

没有答案