如何多次运行完整的DAG,而不是重复运行每个任务

时间:2018-12-17 16:46:32

标签: python airflow

我有一个DAG,其中有多个任务排成简单和直接依赖项。

import datetime as dt

from airflow import DAG
from airflow.operators.python_operator import PythonOperator

from airflow.settings import log


def task1_cb(ds, **kwargs):
    log.info('Task1 Complete for date: %s' % kwargs.get('end_date'))


def task2_cb(ds, **kwargs):
    log.info('Task2 Complete for date: %s' % kwargs.get('end_date'))


def task3_cb(ds, **kwargs):
    log.info('Task3 Complete for date: %s' % kwargs.get('end_date'))


default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'concurrency': 1,
    'retries': 0
}

dag = DAG(
    'sample_serial_dag',
    start_date=dt.datetime(2018,9,1),
    end_date=dt.datetime(2018,9,5),
    default_args=default_args,
    schedule_interval='@daily',
    catchup=True
)

task1 = PythonOperator(task_id='t1', provide_context=True, python_callable=task1_cb, dag=dag)
task2 = PythonOperator(task_id='t2', provide_context=True, python_callable=task2_cb, dag=dag)
task3 = PythonOperator(task_id='t3', provide_context=True, python_callable=task3_cb, dag=dag)

task1 >> task2 >> task3

我希望它赶上过去的日期(运行@daily)。我现在得到的是任务1运行5次以赶上5个截止日期,当完成传递给任务2时,任务2然后运行5次,依此类推。 执行流程如下:

Task1 Complete for date: 2018-09-01
Task1 Complete for date: 2018-09-02
Task1 Complete for date: 2018-09-03
Task1 Complete for date: 2018-09-04
Task1 Complete for date: 2018-09-05

Task2 Complete for date: 2018-09-01
Task2 Complete for date: 2018-09-02
Task2 Complete for date: 2018-09-03
Task2 Complete for date: 2018-09-04
Task2 Complete for date: 2018-09-05

Task3 Complete for date: 2018-09-01
Task3 Complete for date: 2018-09-02
Task3 Complete for date: 2018-09-03
Task3 Complete for date: 2018-09-04
Task3 Complete for date: 2018-09-05

Task 1 in graph completes for all the past dates before continuing to Task 2

我想要的是以下内容:

执行流程如下:

Task1 Complete for date: 2018-09-01
Task2 Complete for date: 2018-09-01
Task3 Complete for date: 2018-09-01

Task1 Complete for date: 2018-09-02
Task2 Complete for date: 2018-09-02
Task3 Complete for date: 2018-09-02

Task1 Complete for date: 2018-09-03
Task2 Complete for date: 2018-09-03
Task3 Complete for date: 2018-09-03

Task1 Complete for date: 2018-09-04
Task2 Complete for date: 2018-09-04
Task3 Complete for date: 2018-09-04

Task1 Complete for date: 2018-09-05
Task2 Complete for date: 2018-09-05
Task3 Complete for date: 2018-09-05

1 个答案:

答案 0 :(得分:0)

发生这种奇怪行为的原因是将default_args的{​​{1}}设置为 False 。我从一些教程或示例代码中复制粘贴了它,却没有真正注意到并知道它的作用。 为per docs

  

depends_on_past(布尔)–设置为true时,任务实例将运行   顺序,同时依靠上一个任务的时间表来成功。   允许运行start_date的任务实例。

将其设置为True可以解决问题,并且解决了我的问题。