Question

在此页面https://www.kaggle.com/baghern/a-deep-dive-into-sklearn-pipelines

它调用fit_transfrom来转换数据，如下所示：

from sklearn.pipeline import FeatureUnion

feats = FeatureUnion([('text', text), 
                      ('length', length),
                      ('words', words),
                      ('words_not_stopword', words_not_stopword),
                      ('avg_word_length', avg_word_length),
                      ('commas', commas)])

feature_processing = Pipeline([('feats', feats)])
feature_processing.fit_transform(X_train)

在进行特征处理训练时，它仅使用fit，然后使用predict

from sklearn.ensemble import RandomForestClassifier

pipeline = Pipeline([
    ('features',feats),
    ('classifier', RandomForestClassifier(random_state = 42)),
])

pipeline.fit(X_train, y_train)

preds = pipeline.predict(X_test)
np.mean(preds == y_test)

问题是，第二种情况是fit在X_train上进行转换（由于transform实现了什么，因为我们在这里没有调用fit_transform）？

Answer 1

sklearn-pipeline具有一些不错的功能。它以非常干净的方式执行多项任务。我们将要执行的features，transformation和list of classifiers定义为一个函数。

第一步

pipeline = Pipeline([
    ('features',feats),
    ('classifier', RandomForestClassifier(random_state = 42)),
])

您已经定义了要素的名称及其转换函数（已合并到feat中），第二步，您已定义了分类器的名称和分类器分类器。

现在，在调用pipeline.fit时，它首先适合要素并对其进行变换，然后将分类器适合于所变换的要素。因此，它为我们做了一些步骤。您可以check-here

更多

管道中的fit vs fit_transform

1 个答案: