AttributeError:“ DataFrame”对象没有属性“ Class”

时间:2018-09-25 13:05:51

标签: python roc auc

这是我的数据集的一个示例:

   Pat_ID  Flare_Up  Demo1     Demo2     Demo3     Demo4  Demo5     Demo6  DisHis1  DisHis1Times  DisHis2    ...     Dis6Treat  Dis7  RespQues1  ResQues1a  ResQues1b  ResQues1c  ResQues2a  SmokHis1  SmokHis2  SmokHis3  SmokHis4
0       1         0      1  0.246004  0.391931  0.237792      0  0.443526        0      0.000000        0    ...             1     0    0.12623     0.1032     0.2439     0.0597        0.0  0.411765  0.263620  0.482759    0.1875
1       2         1      1  0.225851  0.268012  0.268481      0  0.286501        0      0.000000        1    ...             1     0    0.60707     0.3808     0.8637     0.4949        0.1  0.117647  0.098418  0.624138    0.0000
2       3         0      0  0.342599  0.476945  0.296468      1  0.159780        1      0.166667        1    ...             0     0    0.77541     0.6318     1.0000     0.6570        0.3  0.035294  0.020211  0.510345    0.0000

[3 rows x 62 columns]  

我遍历该数据集并打印ROC的代码是:

import pandas as pd 
import matplotlib.pyplot as plt 
import numpy as np 
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, roc_auc_score
import itertools

def plot_confusion_matrix(cm, classes, normalize=True, title='Confusion matrix', cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
#    else:
#        print('Confusion matrix, without normalization')

#    print(cm)

    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.show()

def show_data(cm, print_res = 0):
    tp = cm[1,1]
    fn = cm[1,0]
    fp = cm[0,1]
    tn = cm[0,0]
    if print_res == 1:
        print('Precision =     {:.3f}'.format(tp/(tp+fp)))
        print('Recall (TPR) =  {:.3f}'.format(tp/(tp+fn)))
        print('Fallout (FPR) = {:.3e}'.format(fp/(fp+tn)))
    return tp/(tp+fp), tp/(tp+fn), fp/(fp+tn)

df = pd.read_csv("datasource/DevelopmentData.csv")
print(df.head(3))
y = np.array(df.Class.tolist())     #classes: 1..fraud, 0..no fraud
df = df.drop('Class', 1)
df = df.drop('Time', 1)     # optional
df['Amount'] = StandardScaler().fit_transform(df['Amount'].values.reshape(-1,1))    #optionally rescale non-normalized column
X = np.array(df.as_matrix())   # features  

类别0表示交易是有序的,类别1表示交易是欺诈的。
运行代码时,出现此错误:

Traceback (most recent call last):
  File "finalindex.py", line 54, in <module>
    y = np.array(df.Class.tolist())     #classes: 1..fraud, 0..no fraud
  File "C:\Users\kulkaa\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\generic.py", line 4376, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'Class'  

如何解决该错误?是否需要根据数据集更改列名?

3 个答案:

答案 0 :(得分:1)

尝试一下

df = df.drop(['Class'],axis=1)
df = df.drop(['Time'],axis=1) # optional

答案 1 :(得分:1)

  

[...]作者   链接(kaggle.com/dstuerzer/optimized-logistic-regression)已使用它   而且他的代码可以正常工作。

在您提到的链接example中,作者的数据库中有一个名为“类”的列,但您所显示的数据库却没有。结果, Class 属性在您的数据库中不存在,因此无法访问。

Dominik Stuerzer

   Time        V1        V2        V3        V4        V5        V6        V7  \
0   0.0 -1.359807 -0.072781  2.536347  1.378155 -0.338321  0.462388  0.239599   
1   0.0  1.191857  0.266151  0.166480  0.448154  0.060018 -0.082361 -0.078803   
2   1.0 -1.358354 -1.340163  1.773209  0.379780 -0.503198  1.800499  0.791461   

         V8        V9  ...         V21       V22       V23       V24  \
0  0.098698  0.363787  ...   -0.018307  0.277838 -0.110474  0.066928   
1  0.085102 -0.255425  ...   -0.225775 -0.638672  0.101288 -0.339846   
2  0.247676 -1.514654  ...    0.247998  0.771679  0.909412 -0.689281   

        V25       V26       V27       V28  Amount  Class  
0  0.128539 -0.189115  0.133558 -0.021053  149.62      0  
1  0.167170  0.125895 -0.008983  0.014724    2.69      0  
2 -0.327642 -0.139097 -0.055353 -0.059752  378.66      0  

[3 rows x 31 columns]
     

类别0表示交易是有序的,类别1   表示该交易是欺诈性的。根据个人经验,我们   期望欺诈只占所有交易的一小部分。   确实,在此数据集中,每笔欺诈都将近600   非欺诈性交易:[...]

答案 2 :(得分:0)

尝试

R.either