上游跳过时气流“ none_failed”跳过

时间:2019-10-09 16:38:18

标签: python airflow

我有一个工作流,其中有两个并行进程(sentinel_runsentinel_skip),应根据条件运行或跳过它们,然后将它们合并在一起(resolve)。我需要直接在任一sentinel_任务下游的任务才能进行级联跳过,但是当它进入resolve任务时,resolve应该运行,除非上游的任何一个进程都出现故障。

基于documentation,“ none_failed”触发规则应该有效:

  

none_failed:所有父母都没有失败(失败或上游失败),即所有父母都成功或被跳过了

,它也是对related question的回答。

但是,当我实现一个简单的示例时,那不是我所看到的:

from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import ShortCircuitOperator
from airflow.utils.dates import days_ago

dag = DAG(
    "testing",
    catchup=False,
    schedule_interval="30 12 * * *",
    default_args={
        "owner": "test@gmail.com",
        "start_date": days_ago(1),
        "catchup": False,
        "retries": 0
    }
)

start = DummyOperator(task_id="start", dag=dag)

sentinel_run = ShortCircuitOperator(task_id="sentinel_run", dag=dag, python_callable=lambda: True)
sentinel_skip = ShortCircuitOperator(task_id="sentinel_skip", dag=dag, python_callable=lambda: False)

a = DummyOperator(task_id="a", dag=dag)
b = DummyOperator(task_id="b", dag=dag)
c = DummyOperator(task_id="c", dag=dag)
d = DummyOperator(task_id="d", dag=dag)
e = DummyOperator(task_id="e", dag=dag)
f = DummyOperator(task_id="f", dag=dag)
g = DummyOperator(task_id="g", dag=dag)

resolve = DummyOperator(task_id="resolve", dag=dag, trigger_rule="none_failed")

start >> sentinel_run >> a >> b >> c >> resolve
start >> sentinel_skip >> d >> e >> f >> resolve

resolve >> g

此代码创建以下dag:

DAG Design

问题在于resolved任务应该执行(因为上游upstream_failedfailed都没有),但是它跳过了。

我已经对数据库进行了自省,并且没有隐藏任何失败或上游失败的任务,而且我无法弄清楚为什么它不遵守“ none_failed”逻辑。

我了解"ugly workaround"并已在其他工作流程中实现了它,但是它添加了另一个要执行的任务,并增加了DAG新用户必须使用的复杂性(尤其是当您将其乘以多个任务时) ...)。这是我从Airflow 1.8升级到Airflow 1.10的主要原因,所以我希望有一些明显的遗漏...

2 个答案:

答案 0 :(得分:2)

记录此问题是因为此问题使我痛苦两次,而现在我已解决了两次。

问题分析

将日志级别设置为DEBUG时,您开始看到发生了什么事情:

[2019-10-09 18:30:05,472] {python_operator.py:114} INFO - Done. Returned value was: False
[2019-10-09 18:30:05,472] {python_operator.py:159} INFO - Condition result is False
[2019-10-09 18:30:05,472] {python_operator.py:165} INFO - Skipping downstream tasks...
[2019-10-09 18:30:05,472] {python_operator.py:168} DEBUG - Downstream task_ids [<Task(DummyOperator): f>, <Task(DummyOperator): g>, <Task(DummyOperator): d>, <Task(DummyOperator): resolve>, <Task(DummyOperator): e>]
[2019-10-09 18:30:05,492] {python_operator.py:173} INFO - Done.

由此,您可以看到问题不是“ none_failed”未正确处理任务,而是模拟跳过条件的标记标记了全部下游依赖性直接跳过。 这是ShortCircuitOperator的行为-跳过所有下游,包括下游任务中的所有任务

解决方案

解决此问题的方法是,认识到造成此问题的原因是ShortCircuitOperator的行为,而不是TriggerRule。一旦我们意识到这一点,就该着手编写一个更适合我们实际上要完成的任务的运算符了。

我已经包括了我当前正在使用的运算符;我欢迎您提供任何更好的方法来处理单个下游任务的修改。我敢肯定,“跳过下一个,让其余的按照他们的触发规则进行级联”有一个更好的成语,但是我已经花了比我想要的更多的时间,我怀疑答案更深内部。

"""Sentinel Operator Plugin"""

import datetime

from airflow import settings
from airflow.models import SkipMixin, TaskInstance
from airflow.operators.python_operator import PythonOperator
from airflow.plugins_manager import AirflowPlugin
from airflow.utils.state import State


class SentinelOperator(PythonOperator, SkipMixin):
    """
    Allows a workflow to continue only if a condition is met. Otherwise, the
    workflow skips cascading downstream to the next time a viable task
    is identified.

    The SentinelOperator is derived from the PythonOperator. It evaluates a
    condition and stops the workflow if the condition is False. Immediate
    downstream tasks are skipped. If the condition is True, downstream tasks
    proceed as normal.

    The condition is determined by the result of `python_callable`.
    """
    def execute(self, context):
        condition = super(SentinelOperator, self).execute(context)
        self.log.info("Condition result is %s", condition)

        if condition:
            self.log.info('Proceeding with downstream tasks...')
            return

        self.log.info('Skipping downstream tasks...')

        session = settings.Session()

        for task in context['task'].downstream_list:
            ti = TaskInstance(task, execution_date=context['ti'].execution_date)
            self.log.info('Skipping task: %s', ti.task_id)
            ti.state = State.SKIPPED
            ti.start_date = datetime.datetime.now()
            ti.end_date = datetime.datetime.now()
            session.merge(ti)

        session.commit()
        session.close()

        self.log.info("Done.")


class Plugin_SentinelOperator(AirflowPlugin):
    name = "sentinel_operator"
    operators = [SentinelOperator]

进行修改后,将产生预期的dag结果:

Correct Dag

答案 1 :(得分:1)

这似乎是Airflow中的错误。如果您要解决此问题,请将声音添加到https://issues.apache.org/jira/browse/AIRFLOW-4453