触发子标记

时间:2018-03-23 12:40:46

标签: airflow

EDITED 我通过考虑@ tobi6

的输入编辑了这个问题

我从Airflow源代码

复制了subdag运算符

源代码:https://github.com/apache/incubator-airflow/blob/master/airflow/operators/subdag_operator.py

我在execute方法中修改了一些东西。进行了更改以触发SubDag并等待SubDag完成执行。触发器工作正常,但任务未执行(DAG处于运行/绿色状态,而任务处于 null / White 状态)。

请参阅下面的更改:

from airflow.exceptions import AirflowException
from airflow.models import BaseOperator, Pool
from airflow.utils.decorators import apply_defaults
from airflow.utils.db import provide_session
from airflow.utils.state import State
from airflow.executors import GetDefaultExecutor
from time import sleep
import logging

from datetime import datetime


class SubDagOperator(BaseOperator):

    template_fields = tuple()
    ui_color = '#555'
    ui_fgcolor = '#fff'

    @provide_session
    @apply_defaults
    def __init__(
            self,
            subdag,
            executor=GetDefaultExecutor(),
            *args, **kwargs):
        """
        Yo dawg. This runs a sub dag. By convention, a sub dag's dag_id
        should be prefixed by its parent and a dot. As in `parent.child`.

        :param subdag: the DAG object to run as a subdag of the current DAG.
        :type subdag: airflow.DAG
        :param dag: the parent DAG
        :type subdag: airflow.DAG
        """
        import airflow.models
        dag = kwargs.get('dag') or airflow.models._CONTEXT_MANAGER_DAG
        if not dag:
            raise AirflowException('Please pass in the `dag` param or call '
                                   'within a DAG context manager')
        session = kwargs.pop('session')
        super(SubDagOperator, self).__init__(*args, **kwargs)

        # validate subdag name
        if dag.dag_id + '.' + kwargs['task_id'] != subdag.dag_id:
            raise AirflowException(
                "The subdag's dag_id should have the form "
                "'{{parent_dag_id}}.{{this_task_id}}'. Expected "
                "'{d}.{t}'; received '{rcvd}'.".format(
                    d=dag.dag_id, t=kwargs['task_id'], rcvd=subdag.dag_id))

        # validate that subdag operator and subdag tasks don't have a
        # pool conflict
        if self.pool:
            conflicts = [t for t in subdag.tasks if t.pool == self.pool]
            if conflicts:
                # only query for pool conflicts if one may exist
                pool = (
                    session
                    .query(Pool)
                    .filter(Pool.slots == 1)
                    .filter(Pool.pool == self.pool)
                    .first()
                )
                if pool and any(t.pool == self.pool for t in subdag.tasks):
                    raise AirflowException(
                        'SubDagOperator {sd} and subdag task{plural} {t} both '
                        'use pool {p}, but the pool only has 1 slot. The '
                        'subdag tasks will never run.'.format(
                            sd=self.task_id,
                            plural=len(conflicts) > 1,
                            t=', '.join(t.task_id for t in conflicts),
                            p=self.pool
                        )
                    )

        self.subdag = subdag
        self.executor = executor

    def execute(self, context):
        dag_run = self.subdag.create_dagrun(
            conf=context['dag_run'].conf,
            state=State.RUNNING,
            execution_date=context['execution_date'],
            run_id='trig__' + str(datetime.utcnow()),
            external_trigger=True
        )


        while True:
            if dag_run.get_state() == State.FAILED or dag_run.get_state() == State.SUCCESS:
                break
            else:
                sleep(10)
                continue

下面的代码显示了我如何使用相同的

from airflow import DAG
from operators.sd_operator import SubDagOperator  # My SubDag Operator
from airflow.operators.python_operator import PythonOperator

import logging
from datetime import datetime

default_args = {
        'owner': 'airflow',
        'depends_on_past': False,
        'start_date': datetime(2017, 7, 17),
        'email': ['airflow@example.com'],
        'email_on_failure': False,
        'email_on_retry': False,
    }


def print_dag_details(**kwargs):
    logging.info(str(kwargs['dag_run'].conf))


with DAG('example_dag', schedule_interval=None, catchup=False, default_args=default_args) as dag:
    task_1 = SubDagOperator(
        subdag=sub_dag_func('example_dag', 'sub_dag_1'),
        task_id='sub_dag_1'
    )

    task_2 = SubDagOperator(
        subdag=sub_dag_func('example_dag', 'sub_dag_2'),
        task_id='sub_dag_2',
    )

    print_kwargs = PythonOperator(
        task_id='print_kwargs',
        python_callable=print_dag_details,
        provide_context=True
    )

    print_kwargs >> task_1 >> task_2 

您提供的任何信息都会有所帮助。提前致谢。

1 个答案:

答案 0 :(得分:1)

在没有上下文的情况下理解你的问题有点难。

“我复制了subdag运算符并在execute方法中修改了一些内容。”

  • 从哪里复制过来?

“触发器工作得很好......”

  • 这是怎么回事?

我在代码中看到了一些东西:

  • 将指定的字段添加到sub_dag_func的函数调用中可能会有所帮助,例如: sub_dag_func(subdag='parent_dag'...)

  • 在二进制移位定义中,用于设置上游/下游,我在DAG(df_job_1df_job_2)中找不到定义的任务。这可能与SubDAG相关(尚未查看)。

  • 子dag的名称似乎与代码By convention, a sub dag's dag_id should be prefixed by its parent and a dot中的注释不一致,但它是sub_dag_1sub_dag_2