Scikit-Learn决策树:预测的概率是a还是b?

时间:2017-11-12 17:15:20

标签: python machine-learning scikit-learn classification decision-tree

我有一个带Scikit-Learn的基本决策树分类器:

#Used to determine men from women based on height and shoe size

from sklearn import tree

#height and shoe size
X = [[65,9],[67,7],[70,11],[62,6],[60,7],[72,13],[66,10],[67,7.5]]

Y=["male","female","male","female","female","male","male","female"]

#creating a decision tree
clf = tree.DecisionTreeClassifier()

#fitting the data to the tree
clf.fit(X, Y)

#predicting the gender based on a prediction
prediction = clf.predict([68,9])

#print the predicted gender
print(prediction)

当我运行程序时,它总是输出" male"或"女性",但我怎样才能看到预测男性或女性的可能性?例如,上面的预测返回"男性",但我如何才能打印预测男性的概率?

谢谢!

3 个答案:

答案 0 :(得分:1)

您可以执行以下操作:

from sklearn import tree

#load data
X = [[65,9],[67,7],[70,11],[62,6],[60,7],[72,13],[66,10],[67,7.5]]
Y=["male","female","male","female","female","male","male","female"]

#build model
clf = tree.DecisionTreeClassifier()

#fit
clf.fit(X, Y)

#predict
prediction = clf.predict([[68,9],[66,9]])

#probabilities
probs = clf.predict_proba([[68,9],[66,9]])

#print the predicted gender
print(prediction)
print(probs)

<强>理论

clf.predict_proba(X)的结果是:预测的类概率,即叶子中同一类的样本的分数。

对结果的解释:

第一个print返回['male' 'male'],因此数据[[68,9],[66,9]]预测为males

第二个print返回:

[[ 0. 1.] [ 0. 1.]]

这意味着数据被预测为男性,这是由第二列中的数据报告的。

要查看课程的顺序,请使用:clf.classes_

返回:['female', 'male']

答案 1 :(得分:1)

我最上面的答案是正确的, 因为树是完整的而不是为了使树变弱而无法截断,所以您将获得二进制输出,可以将max_depth设置为较低的深度,这样概率就不会像[0。 1.] 它看起来像[0.25 0.85] 这里的另一个问题是数据集非常小且易于解决,因此最好使用更复杂的数据集 一些链接,可能会使您更加清楚 https://rpmcruz.github.io/machine%20learning/2018/02/09/probabilities-trees.html https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier.predict_proba

答案 2 :(得分:0)

听起来你需要阅读sklearn documentation for DecisionTreeClassifier并看到:

{{1}}