决策树precision_score给出“ ValueError:找到的输入变量样本数量不一致”

时间:2018-10-13 10:53:00

标签: python-3.x machine-learning scikit-learn decision-tree

我正在尝试使用给定的数据创建决策树。但是出于某些原因,accuracy_score给出

  

ValueError:找到样本数量不一致的输入变量:

当我将训练数据分为验证(%20)和训练(%80)时。

这是我拆分数据的方式:

from sklearn.utils import shuffle

from sklearn.model_selection import train_test_split

# stDt shuffled training set

stDt = shuffle(tDt) 

#divide shuffled training set to training and validation set

stDt, vtDt = train_test_split(stDt,train_size=0.8, shuffle=False)

print(tDt.shape)
print(stDt.shape)
print(vtDt.shape)

这是我训练数据的方式:

#attibutes and labels of training set

attributesT =  stDt.values

labelsT = stDt.label


# Train Decision tree classifiers
from sklearn.tree import DecisionTreeClassifier


dtree1 = DecisionTreeClassifier(min_samples_split = 1.0)

dtree2 = DecisionTreeClassifier(min_samples_split = 3)

dtree3 = DecisionTreeClassifier(min_samples_split = 5)



fited1 = dtree1.fit(attributesT,labelsT)

fited2 = dtree2.fit(attributesT,labelsT)

fited3 = dtree3.fit(attributesT,labelsT)

这是测试和准确性得分部分:

from sklearn.metrics import accuracy_score

ret1 = fited1.predict(stDt)

ret2 = fited2.predict(stDt)

ret3 = fited3.predict(stDt)

print(accuracy_score(vtDt.label,ret1))

1 个答案:

答案 0 :(得分:1)

由于您正尝试将训练 ret1 = fited1.predict(stDt))集(vtDt.label)产生的预测与 validation 的标签进行比较,因此会出现预期的错误。设置(fitted1

这是为# predictions on the training set: ret1 = fitted1.predict(stDt) # training accuracy: accuracy_score(stDt.label,ret1) # predictions on the validation set: pred1 = fitted1.predict(vtDt) # validation accuracy: accuracy_score(vtDt.label,pred1) 模型(与其他模型类似)同时获得训练和验证准确性的正确方法:

{{1}}