我正在气流中创建自己的自定义运算符,并希望将一个运算符的输出用作另一运算符的输入。目前,我正在做的方式是将输出存储在s3中,并在下一个运算符中从s3中读取它们,这似乎是一种有效的方式。我遇到了以下帖子Airflow and data transfer between operators 但我不太清楚。如果有人能给我一个很好的例子说明操作员之间的数据传输,我将非常感激。
class SFScoreDataOperator(BaseOperator):
@apply_defaults
def __init__(self, aws_key, aws_secret,
model_version='2020-04-v2',
*args, **kwargs):
self.__version__ = model_version
self.aws_key = aws_key
self.aws_secret = aws_secret
super(SFScoreDataOperator, self).__init__(*args, **kwargs)
def execute(self, context):
SuccessFactorData(aws_key=self.aws_key, aws_secret=self.aws_secret, to_random_shuffle=False).get_data()
class SFScoreGetTrainOperator(BaseOperator):
@apply_defaults
def __init__(self, aws_key, aws_secret,
model_version='2020-04-v2',
*args, **kwargs):
self.__version__ = model_version
self.aws_key = aws_key
self.aws_secret = aws_secret
super(SFScoreGetTrainOperator, self).__init__(*args, **kwargs)
def execute(self, context):
CM = SuccessFactorTrain(aws_key=self.aws_key, aws_secret=self.aws_secret)
df_to_predict = CM.fetch_predict_data()
df_train_test = CM.fetch_final_data()
CM.train_test(final_df_for_train_test=df_train_test, segment_type=None)
class SuccessFactorDataPluginV2(AirflowPlugin):
name = 'success_factor_data_plugin_v2'
operators = [SFScoreDataOperator]
class SuccessFactorTrainPluginV2(AirflowPlugin):
name = 'success_factor_train_plugin_v2'
operators = [SFScoreGetTrainOperator]
SFScoreDataOperator类中的类方法get_data输出两个表,这些表是SFScoreGetTrainOperator中train_test方法的输入,train_test方法输出3个变量,它们是下一个自定义运算符的输入。并非所有输出均为CSV格式,因此我无法将它们写入s3。我阅读了XCom文档,但不确定如何实现,因此如果能得到推动,我将不胜感激。谢谢!