我正试图通过将现有的气流任务分配给其他任务来重用它。
def create_new_task_for_dag(task: BaseOperator,
dag: models.DAG) -> BaseOperator:
"""Create a deep copy of given task and associate it with given dag
"""
new_task = copy.deepcopy(task)
new_task.dag = dag
return new_task
print_datetime_task = python_operator.PythonOperator(
task_id='print_datetime', python_callable=_print_datetime)
# define a new dag ...
# add to the new dag
create_new_task_for_dag(print_datetime_task, new_dag)
然后它给出错误Task is missing the start_date parameter
。
如果我在创建运算符print_datetime_task = PythonOperator(task_id='print_datetime', python_callable=_print_datetime, dag=new_dag)
时定义了dag,那就可以了。
我四处搜寻,这似乎是根本原因:https://github.com/apache/airflow/pull/5598,但PR被标记为陈旧。
我想知道是否还有其他方法可以重用分配给其他dag的现有气流任务。
我正在使用apache-airflow[docker,kubernetes]==1.10.10
答案 0 :(得分:3)
虽然我不知道解决当前设计问题的方法(代码布局),但可以通过稍微调整设计使其工作(请注意以下 code-snippets 未经测试)
代替从DAG复制任务,
def create_new_task_for_dag(task: BaseOperator,
dag: models.DAG) -> BaseOperator:
"""Create a deep copy of given task and associate it with given dag
"""
new_task = copy.deepcopy(task)
new_task.dag = dag
return new_task
您可以将任务的实例化(以及它对DAG的分配)移动到单独的实用程序函数中。
from datetime import datetime
from typing import Dict, Any
from airflow.models.dag import DAG
from airflow.operators.python_operator import PythonOperator
def add_new_print_datetime_task(my_dag: DAG,
kwargs: Dict[str, Any]) -> PythonOperator:
"""
Creates and adds a new 'print_datetime' (PythonOperator) task in 'my_dag'
and returns it's reference
:param my_dag: reference to DAG object in which to add the task
:type my_dag: DAG
:param kwargs: dictionary of args for PythonOperator / BaseOperator
'task_id' is mandatory
:type kwargs: Dict[str, Any]
:return: PythonOperator
"""
def my_callable() -> None:
print(datetime.now())
return PythonOperator(dag=my_dag, python_callable=my_callable, **kwargs)
此后,您可以在每次要实例化同一任务(并分配给任何DAG)时调用该函数
with DAG(dag_id="my_dag_id", start_date=datetime(year=2020, month=8, day=22, hour=16, minute=30)) as my_dag:
print_datetime_task_kwargs: Dict[str, Any] = {
"task_id": "my_task_id",
"depends_on_past": True
}
print_datetime_task: PythonOperator = add_new_print_datetime_task(my_dag=my_dag, kwargs=print_datetime_task_kwargs)
# ... other tasks and their wiring
参考文献/读物不错