Question

这个问题非常糟糕，但我想问的是，如何用分类算法显示预测信息的百分比？我正在使用Scikit-learn。

假设我正在尝试根据质地和重量确定是苹果还是橙子：

#Features:  0 = "bumpy" 1 = "smooth"
#Labels:    0 = apple 1 = orange
features = [[140, 1], [130, 1], [150, 0], [170, 0]]
labels = [0, 0, 1, 1]

# We will be using a Decision Tree in this instance
clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)
print(clf.predict([[160, 0]]))

因此，根据模式预测[160, 0]，我们和计算机将预测这最有可能是橙色。 Scikit是否有办法了解我可以预测计算机返回1或0的信心？当我在特征向量中有更多参数时，这一点尤其重要。

Answer 1

是的。

只需使用predict_proba(X)功能（而不是predict()）。

probability = clf.predict_proba([[160, 0]])

scikit中的某些分类器有能力执行此操作，其他分类器不具备此功能。

在DecisionTreeClassifier的情况下，当被问到给定类的概率时，模型将给出训练集中与该特定＆＃34; leaf＆＃中相同类的元素的分数。 34 ;.

决策树中的叶子是一组条件（规则），表示树下的路径。

例如，对于代表[0, 160]的示例[x1, x2]，规则可能是

if x1 < 10:
    if x2 > 150:
        # in our training set of `n` examples, 100 fell under 
        # this rule set. 75 of them were apple, and 25 were orange - thus:
        probability = [0.75, 0.25]  # P(apple) = .75, P(orange) = .25

当然，在二元分类案例（两个类）中，scikit返回两者，但你真的只需要一个或另一个，因为概率是互补的（1 - .75 = .25）。

查看文档here以了解详情。

希望有所帮助。

ML中计算机决策的概率？

1 个答案: