动态dag中的on_failure_callback

时间:2019-02-18 05:38:36

标签: python airflow google-cloud-composer

我正在从列表中动态创建数据,并想向其中一个任务添加on_failure_callback。我已经尝试了以下代码,但是回调似乎并未得到执行。

dag_ids = ['dag_a', 'dag_b', 'dag_c']

for dag_id in dag_ids:

    def failure_callback():
        logging.info('Inside failure callback for {}'.format(dag_id))

    def python_callable(dag_id):
        logging.info('Inside python callable for {}'.format(dag_id))
        raise Exception('Exception raised for dag_id {}'.format(dag_id))

    yesterday = datetime.datetime.combine(
        datetime.datetime.today() - datetime.timedelta(1),
        datetime.datetime.min.time())

    default_args = {
        'start_date': yesterday
    }    

    dag = models.DAG(
            dag_id,
            schedule_interval=None,
            catchup=False,
            default_args=default_args)

    with dag:

        python_task = PythonOperator(
            task_id='python_task',
            python_callable=python_callable,
            op_kwargs={'dag_id': dag_id},
            on_failure_callback=failure_callback,
            dag=dag)

        python_task

    globals()[dag_id] = dag

知道我在做什么错吗?

编辑:

根据建议,我将dag_id传递给失败回调。但是气流代替了dag_id,而是通过上下文字典传递。除了上下文指令外,是否还有关于如何将额外的参数传递给失败回调的想法?

ERROR - Inside failure callback for {u'next_execution_date': None, u'dag_run': <DagRun dag_a @ 2019-02-19 19:23:54.006241: manual__2019-02-19T19:23:54.006241, externally triggered: True>, u'tomorrow_ds_nodash': u'20190220', u'run_id': 'manual__2019-02-19T19:23:54.006241', u'test_mode': False, u'prev_execution_date': None, u'conf': <module 'airflow.configuration' from '/usr/local/lib/airflow/airflow/configuration.py'>, u'tables': None, u'task_instance_key_str': u'dag_a__python_task__20190219', u'END_DATE': '2019-02-19', u'execution_date': datetime.datetime(2019, 2, 19, 19, 23, 54, 6241), u'ts': '2019-02-19T19:23:54.006241', u'macros': <module 'airflow.macros' from '/usr/local/lib/airflow/airflow/macros/__init__.py'>, u'params': {}, u'ti': <TaskInstance: dag_a.python_task 2019-02-19 19:23:54.006241 [failed]>, u'var': {u'json': None, u'value': None}, u'ds_nodash': u'20190219', u'dag': <DAG: dag_a>, u'end_date': '2019-02-19', u'latest_date': '2019-02-19', u'ds': '2019-02-19', u'task_instance': <TaskInstance: dag_a.python_task 2019-02-19 19:23:54.006241 [failed]>, u'yesterday_ds_nodash': u'20190218', u'task': <Task(PythonOperator): python_task>, u'yesterday_ds': '2019-02-18', u'ts_nodash': u'20190219T192354.006241', u'tomorrow_ds': '2019-02-20'}

参考问题here并使其生效!

2 个答案:

答案 0 :(得分:0)

在for循环中,只需将dag_id传递给failure_callback,然后您就可以从failure_callback()中查看日志。

def failure_callback(dag_id):
    logging.info('Inside failure callback for {}'.format(dag_id))

答案 1 :(得分:0)

使用了partials包使其正常运行。更新的代码如下。

from functools import partial

dag_ids = ['dag_a', 'dag_b', 'dag_c']

for dag_id in dag_ids:

    def failure_callback(dag_id, context):
        logging.error('Inside failure callback for {}'.format(dag_id))

    def python_callable(dag_id):
        logging.error('Inside python callable for {}'.format(dag_id))
        raise Exception('Exception raised for dag_id {}'.format(dag_id))

    yesterday = datetime.datetime.combine(
        datetime.datetime.today() - datetime.timedelta(1),
        datetime.datetime.min.time())

    default_args = {
        'start_date': yesterday
    }    

    dag = models.DAG(
            dag_id,
            # Continue to run DAG once per day
            schedule_interval=None,
            catchup=False,
            default_args=default_args)

    with dag:

        python_task = PythonOperator(
            task_id='python_task',
            python_callable=python_callable,
            op_kwargs={'dag_id': dag_id},
            on_failure_callback=partial(failure_callback, dag_id),
            dag=dag)

        python_task

    globals()[dag_id] = dag