引用sklearn:
输入样本的预测类概率计算为 平均预测森林中树木的类概率。
我的问题:有没有办法为每个预测概率提取均方误差?
例如,应该有来自每棵树的预测概率,但我无法找到。
修改
这是predict_proba
函数的sklearn代码:
def predict_proba(self, X):
"""Predict class probabilities for X.
The predicted class probabilities of an input sample are computed as
the mean predicted class probabilities of the trees in the forest. The
class probability of a single tree is the fraction of samples of the same
class in a leaf.
Parameters
----------
X : array-like or sparse matrix of shape = [n_samples, n_features]
The input samples. Internally, its dtype will be converted to
``dtype=np.float32``. If a sparse matrix is provided, it will be
converted into a sparse ``csr_matrix``.
Returns
-------
p : array of shape = [n_samples, n_classes], or a list of n_outputs
such arrays if n_outputs > 1.
The class probabilities of the input samples. The order of the
classes corresponds to that in the attribute `classes_`.
"""
# Check data
X = self._validate_X_predict(X)
# Assign chunk of trees to jobs
n_jobs, _, _ = _partition_estimators(self.n_estimators, self.n_jobs)
# Parallel loop
all_proba = Parallel(n_jobs=n_jobs, verbose=self.verbose,
backend="threading")(
delayed(parallel_helper)(e, 'predict_proba', X,
check_input=False)
for e in self.estimators_)
# Reduce
proba = all_proba[0]
if self.n_outputs_ == 1:
for j in range(1, len(all_proba)):
proba += all_proba[j]
proba /= len(self.estimators_)
else:
for j in range(1, len(all_proba)):
for k in range(self.n_outputs_):
proba[k] += all_proba[j][k]
for k in range(self.n_outputs_):
proba[k] /= self.n_estimators
return proba
所以我似乎可以使用all_proba
数组轻松访问单树概率。
要实现这个!