Question

我正在尝试使用管道进行简单的回归任务，以指定用于回归的多项式的次数（度= 3）。所以我定义：

pipe = make_pipeline(PolynomialFeatures(3), BayesianRidge())

然后拟合：

pipe.fit(X_train, y_train)

最后是预测位：

y_pred = pipe.predict(X_test)

sklearn的BayesianRidge（）对于预测方法有return_std参数，当设置为True时，它返回查询点预测分布的标准差。

无论如何，我是否可以使用管道获取此标准偏差数组？

Answer 1

您需要从their github repository安装最新版本的scikit-learn。接下来，您只需要使用partial from functools。我使用的示例类似于Bayesian Ridge Regression docs中提到的示例。

from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from functools import partial

clf = linear_model.BayesianRidge()

#Make the pipeline
pipe = make_pipeline(PolynomialFeatures(3), clf)

#Patch the predict function of the classifier using partial
clf.predict = partial(clf.predict,return_std=True )

#Fit the pipeline
pipe.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])

#Retrieve the prediction and standard deviation
y_pred, y_std = pipe.predict([[1,2]])
#Output : (array([ 1.547614]), array([ 0.25034696]))

注意：显然这是sklearn管道模块中的一个错误described here。它现在已在最新版本中修复。

参考：

How partial works in Python

使用管道的查询点的预测分布的标准偏差

1 个答案: