Question

我正在编写一个用于手动混淆矩阵的程序。我必须循环遍历1万次。

df_a=df_a.sort_values('proba')
tpr_lst=[]
fpr_lst=[]
for i in tqdm(df_a['proba']): #df_a['proba'] contains 10K points, each point will be taken a new threshold to determine y_pred is 0 or 1, all this is too plot an ROC.
    def y_pred_auc(x):
        if x<i:
            return 0
        else:
            return 1
    df_a['y_pred_auc']=df_a['proba'].map(y_pred_auc)
    df_a['con_mat_label_auc']=df_a[['y','y_pred']].apply(confusion_matrix,axis=1)
    tp_count=len(df_a['con_mat_label_auc']=='TP')
    fp_count=len(df_a['con_mat_label_auc']=='FP')
    tn_count=len(df_a['con_mat_label_auc']=='TN')
    fn_count=len(df_a['con_mat_label_auc']=='FN')

    tpr_auc=tp_count/(tp_count+fn_count)
    fpr_auc=fp_count/(tn_count+fp_count)

    tpr_lst.append(tpr_auc)
    fpr_lst.append(fpr_auc)

即使在c4 AWS Sagemaker实例上，此代码也要花费大约一个小时。无论如何，有没有什么可以优化此代码的？或者有人可以建议我尝试过Colab的一个快速AWS Sagemaker实例，以及在那儿更糟糕的一个实例。

Answer 1

Sagemaker ml.p2.xlarge或使用p2.xlarge。

使用后请停止执行实例，以免支付过多费用。

https://course.fast.ai/start_sagemaker.html

熊猫的长跑时间

1 个答案: