从sklearn Random Forest分类器

时间:2017-02-22 17:38:46

标签: scikit-learn classification random-forest

引用sklearn:

  

输入样本的预测类概率计算为   平均预测森林中树木的类概率。

我的问题:有没有办法为每个预测概率提取均方误差?

例如,应该有来自每棵树的预测概率,但我无法找到。

修改

这是predict_proba函数的sklearn代码:

def predict_proba(self, X):
    """Predict class probabilities for X.
    The predicted class probabilities of an input sample are computed as
    the mean predicted class probabilities of the trees in the forest. The
    class probability of a single tree is the fraction of samples of the same
    class in a leaf.
    Parameters
    ----------
    X : array-like or sparse matrix of shape = [n_samples, n_features]
        The input samples. Internally, its dtype will be converted to
        ``dtype=np.float32``. If a sparse matrix is provided, it will be
        converted into a sparse ``csr_matrix``.
    Returns
    -------
    p : array of shape = [n_samples, n_classes], or a list of n_outputs
        such arrays if n_outputs > 1.
        The class probabilities of the input samples. The order of the
        classes corresponds to that in the attribute `classes_`.
    """
    # Check data
    X = self._validate_X_predict(X)

    # Assign chunk of trees to jobs
    n_jobs, _, _ = _partition_estimators(self.n_estimators, self.n_jobs)

    # Parallel loop
    all_proba = Parallel(n_jobs=n_jobs, verbose=self.verbose,
                         backend="threading")(
        delayed(parallel_helper)(e, 'predict_proba', X,
                                  check_input=False)
        for e in self.estimators_)

    # Reduce
    proba = all_proba[0]

    if self.n_outputs_ == 1:
        for j in range(1, len(all_proba)):
            proba += all_proba[j]

        proba /= len(self.estimators_)

    else:
        for j in range(1, len(all_proba)):
            for k in range(self.n_outputs_):
                proba[k] += all_proba[j][k]

        for k in range(self.n_outputs_):
            proba[k] /= self.n_estimators

    return proba

所以我似乎可以使用all_proba数组轻松访问单树概率。

要实现这个!

0 个答案:

没有答案