我是Airflow的新手,正在致力于使ETL管道更可重用。最初,我有几行顶级代码可以根据一些用户输入参数来确定job_start,但是我通过大量搜索发现,这会在every heartbeat处触发,这会导致截断桌子。
现在,我正在研究将此顶级代码包装到Python Callable中,以便从刷新中获取安全性,但是我不确定将输出传递给其他任务的最佳方法。我的代码要点如下:
def get_job_dts():
#Do something to determine the appropriate job_start_dt and job_end_dt
#Package up as a list as inputs to other PythonCallables using op_args
job_params = [job_start_dt, job_end_dt]
return job_params
t0 = PythonOperator(
task_id = 'get_dates'
python_callable = get_job_dts
dag=dag
)
t1 = PythonOperator(
task_id = 'task_1'
,python_callable=first_task
,op_args=job_params #<-- How do I send job_params to op_args??
,dag=dag
)
t0 >> t1
我到处搜索并听到有关jinja模板,变量或xcoms的信息,但是我对如何实现它一无所知。有谁能举一个例子,看看可以将列表保存到其他任务可以使用的变量中吗?
答案 0 :(得分:1)
最好的方法是在get_job_dts中将值推入XCom中,并在first_task中将值从Xcom中拉回。
def get_job_dts(**kwargs):
#Do something to determine the appropriate job_start_dt and job_end_dt
#Package up as a list as inputs to other PythonCallables using op_args
job_params = [job_start_dt, job_end_dt]
# Push job_params into XCom
kwargs['ti'].xcom_push(key='job_params', value=job_params)
return job_params
def first_task(ti, **kwargs):
# Pull job_params into XCom
job_params = ti.xcom_pull(key='job_params')
# And then do the rest
t0 = PythonOperator(
task_id = 'get_dates'
python_callable = get_job_dts
dag=dag
)
t1 = PythonOperator(
task_id = 'task_1',
provide_context=True,
python_callable=first_task,
op_args=job_params,
dag=dag
)
t0 >> t1
答案 1 :(得分:0)
正如RyantheCoder所述,XCOM是必经之路。我的实现适用于本教程,在该教程中,我根据PythonCallable中的返回值自动执行隐式推送。
我仍然对传递(ti,** kwargs)与使用(** context)传递给拉动函数的区别感到困惑。另外,“ ti”从何而来?
任何澄清表示赞赏。
def get_job_dts(**kwargs):
#Do something to determine the appropriate job_start_dt and job_end_dt
#Package up as a list as inputs to other PythonCallables using op_args
job_params = [job_start_dt, job_end_dt]
# Automatically pushes to XCOM, refer to: Airflow XCOM tutorial: https://airflow.apache.org/concepts.html?highlight=xcom#xcoms
return job_params
def first_task(**context):
# Change task_ids to whatever task pushed the XCOM vars you need, rest are standard notation
job_params = job_params = context['task_instance'].xcom_pull(task_ids='get_dates')
# And then do the rest
t0 = PythonOperator(
task_id = 'get_dates'
python_callable = get_job_dts
dag=dag
)
t1 = PythonOperator(
task_id = 'task_1',
provide_context=True,
python_callable=first_task,
dag=dag
)
t0 >> t1
答案 2 :(得分:0)
正如您提到的动态更改任务开始时间和结束时间一样,我认为您需要创建动态dag,而不是仅将args传递给dag。特别是,在不更改dag名称的情况下更改开始时间和间隔会导致意外结果,强烈建议您不要这样做。因此,您可以参考此link来查看该策略是否有帮助。