
时间:2018-09-26 13:04:27

标签: python roc confusion-matrix auc

我编写了python代码来分析具有//creates new class of oledocumentproperties var doc = new OleDocumentPropertiesClass(); //open your selected file doc.Open(@"C:\Users\ABC\Desktop\Test\1.jpg", false, dsoFileOpenOptions.dsoOptionDefault); //you can set properties with summaryproperties.nameOfProperty = value; for example doc.SummaryProperties.Company = "lol"; //Line 8 : Shows error doc.SummaryProperties.Author = "me"; //after making changes, you need to use this line to save them doc.Save(); 62 columns的数据集。我正在使用299 rows进行logistic regression预测。我正在考虑的列是:


通过使用以上列,我需要预测Demo1, Demo5, Dis1, Dis1Treat, Dis2, Dis3, Dis4, Dis4Treat, Dis5, Dis5Treat, Dis6, Dis6Treat, Dis7, DisHis1, DisHis2, DisStage1, DisStage2, LungFun19 的二进制值。



df = pd.read_csv("datasource/FinalData/Scoring_dataset.csv")
y = np.array(df.Demo1.tolist())     
df = df.drop('Dis2', 1)
df = df.drop('Dis3', 1)       
df['LungFun1'] = StandardScaler().fit_transform(df['LungFun1'].values.reshape(-1,1))    #optionally rescale non-normalized column
X = np.array(df.as_matrix())  

这是我用来计算X, X_val, y, y_val = train_test_split(X, y, test_size = 0.2) print(X, X_val, y, y_val) lrn = LogisticRegression() X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2) lrn.fit(X_train, y_train) y_pred = lrn.predict(X_test) cm = confusion_matrix(y_test, y_pred) if lrn.classes_[0] == 1: #make sure the ordering of the classes is correct cm = np.array([[cm[0,1], cm[0,0]], [cm[1,1], cm[1,0]]]) pc.plot_confusion_matrix(cm, ['0', '1'], ) pr, tpr, fpr = pc.show_data(cm, print_res = 1); 的函数:


以下是绘制def ROC(X, y, c, r): #makes cross_validation for given parameters c,r. Returns FPR, TPR (averaged) dic_weight = {1:len(y)/(r*np.sum(y)), 0:len(y)/(len(y)-r*np.sum(y))} #specify class weights lrn = LogisticRegression(penalty = 'l2', C = c, class_weight = dic_weight) N_iter = 300 #repeat how often (taking the mean) mean_tpr = 0.0 mean_thresh = 0.0 mean_fpr = np.linspace(0, 1, 50000) mean_auc = 0 for it in range(N_iter): X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2) lrn.fit(X_train, y_train) y_prob = lrn.predict_proba(X_test)[:,lrn.classes_[1]] fpr, tpr, thresholds = roc_curve(y_test, y_prob) thresholds[0] = 1 mean_tpr += np.interp(mean_fpr, fpr, tpr) mean_thresh += np.interp(mean_fpr, fpr, thresholds) mean_tpr[0] = 0.0 mean_auc += roc_auc_score(y_test, y_prob) mean_tpr /= N_iter mean_thresh /= N_iter mean_tpr[-1] = 1.0 return mean_fpr, mean_tpr, mean_auc/N_iter, mean_thresh 的函数:



def plot_roc(X,y, list_par_1, par_1 = 'C', par_2 = 1):

    f = plt.figure(figsize = (8,4));
    for p in list_par_1:
        if par_1 == 'C':
            c = p
            r = par_2
            r = p
            c = par_2
        list_FP, list_TP, AUC, mean_thresh = ROC(X, y, c, r)      
        plt.plot(list_FP, list_TP, label = 'C = {}, r = {}, TPR(3e-4) = {:.4f}'.format(c,r,list_TP[15]));
        del list_FP, list_TP
    plt.legend(title = 'values', loc='lower right')
    plt.xlim(0, 0.001)   #we are only interested in small values of FPR
    plt.ylim(0.5, 0.9)
    plt.title('ROC detail')
    plt.axvline(3e-4, color='b', linestyle='dashed', linewidth=2)

plot_roc(X,y, [1, 3, 10, 30, 100], 'r', 1)  

我不明白为什么精度会以Precision = 1.00000 Recall (TPR) = 1.00000 Fallout (FPR) = 0.00000e+00 的形式出现。数据标准化了吗?我的代码中缺少什么?

0 个答案:
