下面是我创建的DAG的简单复制。 DAG具有分支运算符,用于选择合并到常见任务中的执行流。该任务应该生成一个文件列表,该列表将用于为列表文件中的每个条目创建一个任务。 问题是我无法让动态任务执行。
"""
Required packages to execute DAG
"""
from __future__ import print_function
from builtins import range
import airflow
from airflow.models import DAG
from datetime import datetime, timedelta
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
from airflow.operators.python_operator import BranchPythonOperator
from airflow.operators.dummy_operator import DummyOperator
from airflow.utils.trigger_rule import TriggerRule
import os
import sys
# DAG parameters
args = {
'owner': 'AD',
'depends_on_past': False,
'start_date': datetime(2018, 5, 30),
'end_date': datetime(9999, 12, 31),
'dagrun_timeout': None,
'timeout': None,
'execution_timeout': None,
'provide_context': True,
}
# create DAG object with Name and default_args (args can set in DAG definition or while execution/runtime)
dag = DAG('sodag', schedule_interval=None, default_args=args)
# Define task - below are examples of tasks created by instantiated by PythonOperator- calling methods written in other py clas
start = DummyOperator(task_id='start', dag=dag)
dummyjoin = DummyOperator(task_id='dummyjoin', dag=dag, trigger_rule=TriggerRule.ONE_SUCCESS)
multidummy = DummyOperator(task_id='multidummy', dag=dag)
def identify_pre_process(**context):
return 'task1'
def xcl_preq(filename, **kwargs):
return BashOperator(
task_id="so_dag{}".format(filename),
trigger_rule=TriggerRule.ONE_SUCCESS,
provide_context=True,
bash_command='echo "executing branch tasks"',
dag=dag)
with dag:
router = BranchPythonOperator(task_id='trigger_pre_process',
python_callable=identify_pre_process,
dag=dag)
task1 = BashOperator(
task_id="task1",
bash_command='echo "executing task1"',
execution_timeout=None,
dag=dag)
task2 = BashOperator(
task_id="task2",
bash_command='echo "executing task2"',
execution_timeout=None,
dag=dag)
with open('/root/filelist.txt', 'r') as fp:
for file in fp:
filename = os.path.basename(file)
dummyjoin >> xcl_preq(filename) >> multidummy
start >> router
router >> task1 >> dummyjoin
router >> task2 >> dummyjoin
答案 0 :(得分:1)
导致问题的原因不是任务是动态生成的,而是棘手的事情。您的DAG运作良好,除了以下细微之处:
filename = os.path.basename(file)
变量filename
将包含换行符\n
。在您的示例中,filename
将采用值file\n
,file1\n
,file2\n
。这将导致这些任务无法运行,因为显然不允许使用特殊字符作为task_id的值(我同意,在DAG编译时未引发任何错误是很奇怪的)。您不会在用户界面中通过DAG的“图形视图”浏览,因为其中未显示换行符,但是如果单击DAG的“详细信息”,则该问题将变得可见。
一个简单的解决方法是从文件中读取后从行中删除换行符,即
filename = os.path.basename(file.rstrip())
成功!