Question

我想用neuraxle创建一个简单的管道（我知道我可以使用其他库，但是我想使用neuraxle），在这里我要清理数据，拆分数据，训练2个模型并进行比较他们。

我希望我的管道执行以下操作：

p = Pipeline([
    PreprocessData(),
    SplitData(),
    (some magic to start the training of both models with the split of the previous step)
    ("model1", model1(params))
    ("model2", model2(params))
    (evaluate)
])

我不知道是否有可能，因为我在文档中找不到任何内容。

我还尝试使用sklearn以外的其他模型（例如catboost，xgboost ...），但出现错误

AttributeError：“ CatBoostRegressor”对象没有属性“ setup”

我曾考虑为模型创建一个类，但不会使用neuraxle的超参数搜索

Answer 1

是的！您可以执行以下操作：

p = Pipeline([
    PreprocessData(),
    ColumnTransformer([
        (0, model1(params)),  # Model 1 will receive Column 0 of data
        ([1, 2], model2(params)),  # Model 2 will receive Column 1 and 2 of data
    ], n_dimension=2, n_jobs=2),
    (evaluate)
])

数据流将被分为两部分。

n_jobs=2应该创建两个线程。也可以传递一个自定义类，以使用joiner参数将数据放回一起。我们将很快发布一些更改，因此这应该可以正常工作。目前，管道使用1个线程。

对于与sklearn类似但不是来自sklearn的CatBoostRegressor模型，您可以尝试在模型中声明模型时使用SKLearnWrapper(model1(params))而不是简单地model1(params)吗？管道？即使您的对象与scikit-learn的BaseEstimator具有相同的API，Neuraxle可能也无法将模型识别为scikit-learn模型（这是scikit-learn中的BaseEstimator对象）。因此，您可能需要在模型周围手动使用SKLearnWrapper或编写自己的类似包装器，以使您的类适应Neuraxle。

相关：https://stackoverflow.com/a/60302366/2476920

编辑：

您可以使用Neuraxle的ParallelQueuedFeatureUnion类。示例即将到来。

另请参见以下并行管道用法示例：https://www.neuraxle.org/stable/examples/parallel/plot_streaming_pipeline.html#sphx-glr-examples-parallel-plot-streaming-pipeline-py

如何在scikit-learn或Neuraxle中并行运行2条管道？

1 个答案: