Question

我是scikit-learn和随机森林回归的新手，并且想知道除了组合预测之外，是否有一种简单的方法可以从随机森林中的每棵树中获得预测。我想在列表中输出所有预测而不是查看整个树。我知道我可以使用apply方法获取叶子索引，但我不知道如何使用它来从叶子中获取值。任何帮助表示赞赏。

编辑：以下是我迄今为止的评论。之前我不清楚是否可以调用estimators_属性中的树，但似乎可以使用该属性在每个树上使用预测方法。这是最好的方法吗？

numberTrees = 100
clf = RandomForestRegressor(n_estimators=numberTrees)
clf.fit(X,Y)
for tree in range(numberTrees):
    print(clf.estimators_[tree].predict(val.irow(1)))

Answer 1

Chunky，我在Kaggle比赛中使用sklearn，我很确定你所拥有的是关于你能做的最好的事情。如您所述，predict（）返回整个RF的预测，但不返回其组件树的预测。它可以返回矩阵，但这仅适用于同时学习多个目标的情况。在这种情况下，它为每个目标返回一个预测，它不会返回每个树的预测。您可以使用predict.all = True在R的随机森林中获得单个树预测，但sklearn没有。如果你尝试使用apply（），你会得到一个叶子索引矩阵，然后你仍然必须遍历树，以找出该树/叶组合的预测。所以我认为你拥有的就是它的最佳状态。

Answer 2

我不是100％确定你到底想要什么，但Scikit-learns Random Forest Regressor中还有其他一些方法很可能会返回你想要的，特别是predict方法！此方法返回预测值的数组。你所指的关于获得均值的是score方法，它只使用predict方法返回R平方行列式的系数。

Answer 3

我遇到了同样的问题，但我不知道您如何通过使用print（clf.estimators_ [tree] .predict（val.irow（1）））得到正确的答案。它给了我随机数字，而不是实际的课程。在阅读了SKlearn的源代码之后，我意识到我们实际上必须使用predict_proba（）而不是在代码中进行预测，它为您提供了树根据clf.classes_中的顺序预测的类。例如：

    tree_num = 2
    tree_pred = clf.estimators_[tree_num].predict_proba(data_test)
    print clf.classes_  #gives you the order of the classes
    print tree_pred  #gives you an array of 0 with the predicted class as 1
    >>> ['class1','class2','class3']
    >>> [0, 1, 0]

您还可以在数据上使用cls.predict_proba（），它通过树的积累为您提供每种类别预测的可能性，并使您摆脱自己遍历每棵树的痛苦：

    x = clf.predict_proba(data_test) # assume data_test has two instances
    print rfc.classes_
    print x
    >>> ['class1', 'class2', 'class3']
    >>> [[0.12 ,  0.02,  0.86], # probabilities for the first instance
         [0.35 ,  0.01,  0.64]]  # for the second instance

Answer 4

我最近所做的是修改sklearn源代码以获取它。内部sklearn包 sklearn.ensemble.Randomforestregressor

有一个功能，如果添加打印，您将看到每棵树的单独结果。您可以将其更改为返回值，并获取每棵树的单独结果。

def _accumulate_prediction(predict, X, out, lock):
    """
    This is a utility function for joblib's Parallel.

    It can't go locally in ForestClassifier or ForestRegressor, because joblib
    complains that it cannot pickle it when placed there.
    """
    prediction = predict(X, check_input=False)
    print(prediction)
    with lock:
        if len(out) == 1:
            out[0] += prediction
        else:
            for i in range(len(out)):
                out[i] += prediction[i]

这有点复杂，因为您必须修改sklearn源代码

如何从Python Scikit-learn中的随机森林中的每棵树输出回归预测？

4 个答案: