Question

如果我在sklearn中创建一个Pipeline，第一步是一个转换（Imputer），第二步是将RandomForestClassifier与关键字参数warmstart拟合为{{1 }}，如何依次调用RandomForestClassifier？ True在嵌入在“管道”中时会做什么？

http://scikit-learn.org/0.18/auto_examples/missing_values.html

Answer 1

是的，但是管道部分变得有些复杂。

您看到warm_start仅在您增加n_estimators中的RandomForestClassifier时有用。

See here：-

        warn("Warm-start fitting without increasing n_estimators does not "
             "fit new trees.")

因此，您需要在管道中增加n_estimators中的RandomForestClassifier。

为此，您首先需要从管道访问RandomForestClassifier估计器，然后根据需要设置n_estimators。但是，当您在管道上调用fit()时，imputer步骤仍然会执行（每次都会重复）。

例如，考虑以下管道：

pipe = Pipeline([('imputer', Imputer()), 
                 ('clf', RandomForestClassifier(warm_start=True))])

现在根据您的问题，您需要执行此操作才能使用warm_start：-

# Fit the data initially
pipe.fit(X, y)

# Change the n_estimators (any one line from given two)
pipe.set_params(clf__n_estimators=30)
  OR
pipe.named_steps['clf'].n_estimators = 30

# Fit the same data again or new data
pipe.fit(X_new, y_new)

在第一次调用pipe.fit()时，将使插播器适合给定的数据（X，y）。现在，在第二次调用fit()时，根据数据可能会发生两件事：

如果再次提供相同的数据，则仍将重新安装该插卡器，这是不需要的。
如果数据不同，则将在新数据上安装脉冲源，并忘记先前学习的信息。因此，在此新数据中对缺失值的估算将不同于它对先前数据的处理方式。我认为这不是您想要的。

管道中的随机森林-sklearn

1 个答案: