我有一个DAG,其中有多个任务排成简单和直接依赖项。
import datetime as dt
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.settings import log
def task1_cb(ds, **kwargs):
log.info('Task1 Complete for date: %s' % kwargs.get('end_date'))
def task2_cb(ds, **kwargs):
log.info('Task2 Complete for date: %s' % kwargs.get('end_date'))
def task3_cb(ds, **kwargs):
log.info('Task3 Complete for date: %s' % kwargs.get('end_date'))
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'concurrency': 1,
'retries': 0
}
dag = DAG(
'sample_serial_dag',
start_date=dt.datetime(2018,9,1),
end_date=dt.datetime(2018,9,5),
default_args=default_args,
schedule_interval='@daily',
catchup=True
)
task1 = PythonOperator(task_id='t1', provide_context=True, python_callable=task1_cb, dag=dag)
task2 = PythonOperator(task_id='t2', provide_context=True, python_callable=task2_cb, dag=dag)
task3 = PythonOperator(task_id='t3', provide_context=True, python_callable=task3_cb, dag=dag)
task1 >> task2 >> task3
我希望它赶上过去的日期(运行@daily
)。我现在得到的是任务1运行5次以赶上5个截止日期,当完成传递给任务2时,任务2然后运行5次,依此类推。
执行流程如下:
Task1 Complete for date: 2018-09-01
Task1 Complete for date: 2018-09-02
Task1 Complete for date: 2018-09-03
Task1 Complete for date: 2018-09-04
Task1 Complete for date: 2018-09-05
Task2 Complete for date: 2018-09-01
Task2 Complete for date: 2018-09-02
Task2 Complete for date: 2018-09-03
Task2 Complete for date: 2018-09-04
Task2 Complete for date: 2018-09-05
Task3 Complete for date: 2018-09-01
Task3 Complete for date: 2018-09-02
Task3 Complete for date: 2018-09-03
Task3 Complete for date: 2018-09-04
Task3 Complete for date: 2018-09-05
我想要的是以下内容:
执行流程如下:
Task1 Complete for date: 2018-09-01
Task2 Complete for date: 2018-09-01
Task3 Complete for date: 2018-09-01
Task1 Complete for date: 2018-09-02
Task2 Complete for date: 2018-09-02
Task3 Complete for date: 2018-09-02
Task1 Complete for date: 2018-09-03
Task2 Complete for date: 2018-09-03
Task3 Complete for date: 2018-09-03
Task1 Complete for date: 2018-09-04
Task2 Complete for date: 2018-09-04
Task3 Complete for date: 2018-09-04
Task1 Complete for date: 2018-09-05
Task2 Complete for date: 2018-09-05
Task3 Complete for date: 2018-09-05
答案 0 :(得分:0)
发生这种奇怪行为的原因是将default_args
的{{1}}设置为 False 。我从一些教程或示例代码中复制粘贴了它,却没有真正注意到并知道它的作用。
为per docs:
depends_on_past(布尔)–设置为true时,任务实例将运行 顺序,同时依靠上一个任务的时间表来成功。 允许运行start_date的任务实例。
将其设置为True可以解决问题,并且解决了我的问题。