在Airflow DAG中跳过动态任务

时间:2019-01-09 14:20:25

标签: dynamic task airflow

下面是我创建的DAG的简单复制。 DAG具有分支运算符,用于选择合并到常见任务中的执行流。该任务应该生成一个文件列表,该列表将用于为列表文件中的每个条目创建一个任务。 问题是我无法让动态任务执行。

"""
Required packages to execute DAG
"""
from __future__ import print_function
from builtins import range
import airflow
from airflow.models import DAG

from datetime import datetime, timedelta
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
from airflow.operators.python_operator import BranchPythonOperator
from airflow.operators.dummy_operator import DummyOperator
from airflow.utils.trigger_rule import TriggerRule

import os
import sys


# DAG parameters

args = {
    'owner': 'AD',
    'depends_on_past': False,
    'start_date': datetime(2018, 5, 30),
    'end_date': datetime(9999, 12, 31),
    'dagrun_timeout': None,
    'timeout': None,
    'execution_timeout': None,
    'provide_context': True,
}

# create DAG object with Name and default_args (args can set in DAG definition or while execution/runtime)
dag = DAG('sodag', schedule_interval=None, default_args=args)


# Define task - below are examples of tasks created by instantiated by PythonOperator- calling methods written in other py clas
start = DummyOperator(task_id='start', dag=dag)
dummyjoin = DummyOperator(task_id='dummyjoin', dag=dag, trigger_rule=TriggerRule.ONE_SUCCESS)
multidummy = DummyOperator(task_id='multidummy', dag=dag)


def identify_pre_process(**context):
    return 'task1'


def xcl_preq(filename, **kwargs):
    return BashOperator(
            task_id="so_dag{}".format(filename),
            trigger_rule=TriggerRule.ONE_SUCCESS,
            provide_context=True,
            bash_command='echo "executing branch tasks"',
            dag=dag)


with dag:
    router = BranchPythonOperator(task_id='trigger_pre_process',
                                  python_callable=identify_pre_process,
                                  dag=dag)

    task1 = BashOperator(
                    task_id="task1",
                    bash_command='echo "executing task1"',
                    execution_timeout=None,
                    dag=dag)

    task2 = BashOperator(
                    task_id="task2",
                    bash_command='echo "executing task2"',
                    execution_timeout=None,
                    dag=dag)

with open('/root/filelist.txt', 'r') as fp:
    for file in fp:
        filename = os.path.basename(file)
        dummyjoin >> xcl_preq(filename) >> multidummy


start >> router
router >> task1 >> dummyjoin
router >> task2 >> dummyjoin

enter image description here

1 个答案:

答案 0 :(得分:1)

导致问题的原因不是任务是动态生成的,而是棘手的事情。您的DAG运作良好,除了以下细微之处:

filename = os.path.basename(file)

变量filename将包含换行符\n。在您的示例中,filename将采用值file\nfile1\nfile2\n。这将导致这些任务无法运行,因为显然不允许使用特殊字符作为task_id的值(我同意,在DAG编译时未引发任何错误是很奇怪的)。您不会在用户界面中通过DAG的“图形视图”浏览,因为其中未显示换行符,但是如果单击DAG的“详细信息”,则该问题将变得可见。

Dag details screenshot

一个简单的解决方法是从文件中读取后从行中删除换行符,即

filename = os.path.basename(file.rstrip())

成功!

successful DAG screenshot