使用python

时间:2018-03-07 15:22:00

标签: python decision-tree

我正在尝试使用sckitlearn执行决策树:

from sklearn.tree import DecisionTreeClassifier
from sklearn import tree

data = df_train
target = data['SeriousDlqin2yrs']
#Split in train and test
X_train,X_test,target_train,target_test = train_test_split(data, target, 
test_size=0.33, random_state=3)
#Drop target variable
X_train = X_test.drop(['SeriousDlqin2yrs'],axis=1,inplace=False)
X_test = X_test.drop(['SeriousDlqin2yrs'],axis=1,inplace=False)
#fit the tree
tree_clf = tree.DecisionTreeClassifier(max_depth=3).fit(X_train, target_train)
#make prediction
predicted_tree = tree_clf.predict(X_test)
print(classification_report(target_test, predicted_tree))

我不知道为什么,我收到了这个错误:

ValueError: Number of labels=96427 does not match number of samples=47495

如果我不放弃目标变量,它会起作用,我得到的AUC得分为1.0,这也很奇怪

from sklearn.metrics import roc_auc_score
roc_auc_score(target_test, predicted_tree)

Out[139]:1.0

有人知道为什么这不起作用吗?

谢谢!

1 个答案:

答案 0 :(得分:1)

你的代码中有这一行:

X_train = X_test.drop(['SeriousDlqin2yrs'],axis=1,inplace=False)

看起来你正在将测试数据存储在下降后的列车数据中。尝试在代码的那一行使用X_train而不是X_test。 或者,只需使用inplace = True并且不要重新分配它。