在气流自定义操作员之间传输数据

时间:2020-04-26 13:10:59

标签: python airflow apache-airflow-xcom

我正在气流中创建自己的自定义运算符,并希望将一个运算符的输出用作另一运算符的输入。目前,我正在做的方式是将输出存储在s3中,并在下一个运算符中从s3中读取它们,这似乎是一种有效的方式。我遇到了以下帖子Airflow and data transfer between operators 但我不太清楚。如果有人能给我一个很好的例子说明操作员之间的数据传输,我将非常感激。

class SFScoreDataOperator(BaseOperator):
    @apply_defaults
    def __init__(self, aws_key, aws_secret,
                 model_version='2020-04-v2',
                 *args, **kwargs):
        self.__version__ = model_version
        self.aws_key = aws_key
        self.aws_secret = aws_secret
        super(SFScoreDataOperator, self).__init__(*args, **kwargs)

    def execute(self, context):
        SuccessFactorData(aws_key=self.aws_key, aws_secret=self.aws_secret, to_random_shuffle=False).get_data()


class SFScoreGetTrainOperator(BaseOperator):
    @apply_defaults
    def __init__(self, aws_key, aws_secret,
                 model_version='2020-04-v2',
                 *args, **kwargs):
        self.__version__ = model_version
        self.aws_key = aws_key
        self.aws_secret = aws_secret
        super(SFScoreGetTrainOperator, self).__init__(*args, **kwargs)

    def execute(self, context):
        CM = SuccessFactorTrain(aws_key=self.aws_key, aws_secret=self.aws_secret)
        df_to_predict = CM.fetch_predict_data()
        df_train_test = CM.fetch_final_data()
        CM.train_test(final_df_for_train_test=df_train_test, segment_type=None)

class SuccessFactorDataPluginV2(AirflowPlugin):
    name = 'success_factor_data_plugin_v2'
    operators = [SFScoreDataOperator]


class SuccessFactorTrainPluginV2(AirflowPlugin):
    name = 'success_factor_train_plugin_v2'
    operators = [SFScoreGetTrainOperator]

SFScoreDataOperator类中的类方法get_data输出两个表,这些表是SFScoreGetTrainOperator中train_test方法的输入,train_test方法输出3个变量,它们是下一个自定义运算符的输入。并非所有输出均为CSV格式,因此我无法将它们写入s3。我阅读了XCom文档,但不确定如何实现,因此如果能得到推动,我将不胜感激。谢谢!

0 个答案:

没有答案