如何使用固定步骤定义自定义sklearn管道?

时间:2020-02-13 17:43:07

标签: python scikit-learn

我正在尝试创建具有固定步骤的scikit-learn Pipeline对象,即从Pipeline继承的PipelineWithFixedSteps(Pipeline)对象,以便可以通过简单的调用PipelineWithFixedSteps()实例化它并保持代码干净。 / p>

我注意到,如果创建多个PipelineWithFixedSteps()实例并设置其中一个的参数,则会修改所有实例的参数。

这是预期的行为还是我错过了一些东西?为具有固定步骤的管道定义快捷方式的另一种方法是什么?

我正在使用sklearn 0.22.1

from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

class PipelineWithFixedSteps(Pipeline):    
    def __init__(
        self,
        steps = [
            ('scaler', StandardScaler()),
            ('linear', LinearRegression()),
        ]
    ):
        super().__init__(steps=steps)

a = PipelineWithFixedSteps()
print(a.get_params())

a.set_params(scaler__with_std=False)
print(a.get_params())

# Create a new instance of PipelineWithFixedNames()
# The new instance has the same parameters as a
b = PipelineWithFixedSteps()
print(b.get_params())

# Set the parameters of b
# The parameters of a are also changed
b.set_params(scaler__with_mean=False)
print(a.get_params())

1 个答案:

答案 0 :(得分:3)

这确实与sklearn无关,但归结为如何在Python中解释参数的默认值(例如this question),听起来您正在尝试做以下事情: / p>

class PipelineWithFixedSteps(Pipeline):    
    def __init__(self, steps=None):
        if steps is None:
            steps = [('scaler', StandardScaler()), ('linear', LinearRegression())]
        super().__init__(steps=steps)