气流:如何将Python可调用函数的输出模板化或传递为其他任务的参数?

时间:2019-03-20 22:20:12

标签: airflow apache-airflow-xcom

我是Airflow的新手,正在致力于使ETL管道更可重用。最初,我有几行顶级代码可以根据一些用户输入参数来确定job_start,但是我通过大量搜索发现,这会在every heartbeat处触发,这会导致截断桌子。

现在,我正在研究将此顶级代码包装到Python Callable中,以便从刷新中获取安全性,但是我不确定将输出传递给其他任务的最佳方法。我的代码要点如下:

def get_job_dts(): 

     #Do something to determine the appropriate job_start_dt and job_end_dt

     #Package up as a list as inputs to other PythonCallables using op_args

     job_params = [job_start_dt, job_end_dt]

     return job_params

t0 = PythonOperator(
    task_id = 'get_dates'
    python_callable = get_job_dts
    dag=dag
)

t1 = PythonOperator(
     task_id = 'task_1'
     ,python_callable=first_task
     ,op_args=job_params #<-- How do I send job_params to op_args??
     ,dag=dag
)

t0 >> t1

我到处搜索并听到有关jinja模板,变量或xcoms的信息,但是我对如何实现它一无所知。有谁能举一个例子,看看可以将列表保存到其他任务可以使用的变量中吗?

3 个答案:

答案 0 :(得分:1)

最好的方法是在get_job_dts中将值推入XCom中,并在first_task中将值从Xcom中拉回。

def get_job_dts(**kwargs): 

     #Do something to determine the appropriate job_start_dt and job_end_dt

     #Package up as a list as inputs to other PythonCallables using op_args

    job_params = [job_start_dt, job_end_dt]

    # Push job_params into XCom
    kwargs['ti'].xcom_push(key='job_params', value=job_params)
    return job_params


def first_task(ti, **kwargs):
    # Pull job_params into XCom
    job_params = ti.xcom_pull(key='job_params')
    # And then do the rest


t0 = PythonOperator(
    task_id = 'get_dates'
    python_callable = get_job_dts
    dag=dag
)

t1 = PythonOperator(
    task_id = 'task_1',
    provide_context=True,
    python_callable=first_task,
    op_args=job_params,
    dag=dag
)

t0 >> t1

答案 1 :(得分:0)

正如RyantheCoder所述,XCOM是必经之路。我的实现适用于本教程,在该教程中,我根据PythonCallable中的返回值自动执行隐式推送。

我仍然对传递(ti,** kwargs)与使用(** context)传递给拉动函数的区别感到困惑。另外,“ ti”从何而来?

任何澄清表示赞赏。

def get_job_dts(**kwargs): 

     #Do something to determine the appropriate job_start_dt and job_end_dt

     #Package up as a list as inputs to other PythonCallables using op_args

     job_params = [job_start_dt, job_end_dt]

     # Automatically pushes to XCOM, refer to: Airflow XCOM tutorial: https://airflow.apache.org/concepts.html?highlight=xcom#xcoms
     return job_params

def first_task(**context):

    # Change task_ids to whatever task pushed the XCOM vars you need, rest are standard notation
    job_params = job_params = context['task_instance'].xcom_pull(task_ids='get_dates')

    # And then do the rest


t0 = PythonOperator(
    task_id = 'get_dates'
    python_callable = get_job_dts
    dag=dag
)

t1 = PythonOperator(
    task_id = 'task_1',
    provide_context=True,
    python_callable=first_task,
    dag=dag
)

t0 >> t1

答案 2 :(得分:0)

正如您提到的动态更改任务开始时间和结束时间一样,我认为您需要创建动态dag,而不是仅将args传递给dag。特别是,在不更改dag名称的情况下更改开始时间和间隔会导致意外结果,强烈建议您不要这样做。因此,您可以参考此link来查看该策略是否有帮助。