将气流任务分配给多个DAG

时间:2020-08-22 00:09:43

标签: airflow

我正试图通过将现有的气流任务分配给其他任务来重用它。

def create_new_task_for_dag(task: BaseOperator,
                        dag: models.DAG) -> BaseOperator:
    """Create a deep copy of given task and associate it with given dag
    """
    new_task = copy.deepcopy(task)
    new_task.dag = dag
    return new_task


print_datetime_task = python_operator.PythonOperator(
    task_id='print_datetime', python_callable=_print_datetime)

# define a new dag ...
# add to the new dag
create_new_task_for_dag(print_datetime_task, new_dag)

然后它给出错误Task is missing the start_date parameter。 如果我在创建运算符print_datetime_task = PythonOperator(task_id='print_datetime', python_callable=_print_datetime, dag=new_dag)时定义了dag,那就可以了。

我四处搜寻,这似乎是根本原因:https://github.com/apache/airflow/pull/5598,但PR被标记为陈旧。

我想知道是否还有其他方法可以重用分配给其他dag的现有气流任务。

我正在使用apache-airflow[docker,kubernetes]==1.10.10

1 个答案:

答案 0 :(得分:3)

虽然我不知道解决当前设计问题的方法(代码布局),但可以通过稍微调整设计使其工作(请注意以下 code-snippets 未经测试)


代替从DAG复制任务,

 def create_new_task_for_dag(task: BaseOperator,
                        dag: models.DAG) -> BaseOperator:
    """Create a deep copy of given task and associate it with given dag
    """
    new_task = copy.deepcopy(task)
    new_task.dag = dag
    return new_task

您可以将任务的实例化(以及它对DAG的分配)移动到单独的实用程序函数中。

from datetime import datetime
from typing import Dict, Any

from airflow.models.dag import DAG
from airflow.operators.python_operator import PythonOperator


def add_new_print_datetime_task(my_dag: DAG,
                                kwargs: Dict[str, Any]) -> PythonOperator:
    """
    Creates and adds a new 'print_datetime' (PythonOperator) task in 'my_dag'
    and returns it's reference
    :param my_dag: reference to DAG object in which to add the task
    :type my_dag: DAG
    :param kwargs: dictionary of args for PythonOperator / BaseOperator
                   'task_id' is mandatory
    :type kwargs: Dict[str, Any]
    :return: PythonOperator
    """

    def my_callable() -> None:
        print(datetime.now())

    return PythonOperator(dag=my_dag, python_callable=my_callable, **kwargs)

此后,您可以在每次要实例化同一任务(并分配给任何DAG)时调用该函数

with DAG(dag_id="my_dag_id", start_date=datetime(year=2020, month=8, day=22, hour=16, minute=30)) as my_dag:
    print_datetime_task_kwargs: Dict[str, Any] = {
        "task_id": "my_task_id",
        "depends_on_past": True
    }
    print_datetime_task: PythonOperator = add_new_print_datetime_task(my_dag=my_dag, kwargs=print_datetime_task_kwargs)
    # ... other tasks and their wiring

参考文献/读物不错