为什么Airflow PythonOperator任务失败但返回代码为0?

时间:2019-11-15 15:06:46

标签: python airflow airflow-scheduler airflow-operator

我有一个与PythonOperator一起运行的Airflow DAG,我想知道为什么我的任务执行失败但返回了0退出?

执行失败,返回码为零,这使我误以为该任务已成功执行。

您可以在下面的工作日志或附件图片中看到,任何人都可以解释为什么会发生这种情况并建议如何避免这种情况?

任务实例日志:

  

[2019-11-15 22:45:23,633] {base_task_runner.py:115}信息-作业736:子任务http_request_send_push 2019-11-15 22:45:23,632-10688-错误-74-http_request_send_push:http_request_send_push服务触发-重发-推送错误::

     

[2019-11-15 22:45:23,633] {logging_mixin.py:112}信息-

     

[2019-11-15 22:45:23,632] {notification.py:74}错误-http_request_send_push:http_request_send_push服务触发-重新发送-推送错误::

     

[2019-11-15 22:45:23,633] {python_operator.py:114}信息-完成。返回值为:无

     

[2019-11-15 22:45:25,251] {logging_mixin.py:112}信息-

     

[2019-11-15 22:45:25,250] {local_task_job.py:103}信息-任务退出,返回码0

任务实例日志屏幕截图:

enter image description here

DAG树视图屏幕截图:

enter image description here

2 个答案:

答案 0 :(得分:0)

简单来说,PythonOperator只是一个将执行python函数的运算符。如果有任何错误,并且您希望任务处于failed状态,则需要在python可调用函数内引发Exception。在下面的示例代码中,请参见fourth_task

对此的一种替代方法是使用ShortCircuitOperator。 以下是来自Apache Airflow API reference guide的说明:

  

它会评估条件,如果条件为False,则会短路。任何下游任务都标记为“已跳过”状态。如果条件为True,则下游任务将正常进行。

请参阅下面的示例代码,其中说明了PythonOperatorShortCircuitOperator之间的区别。还显示了如何引发Exception并将任务更改为failed状态。

def first_task(**kwargs):
    logging.info("first_task")


def second_task(**kwargs):
    logging.info("second_task")
    return True


def third_task(**kwargs):
    logging.info("third_task")
    return False


def fourth_task(**kwargs):
    logging.info("fourth_task")
    raise Exception()


def fifth_task(**kwargs):
    logging.info("fifth_task")
    return True


def sixth_task(**kwargs):
    logging.info("sixth_task")
    return False

first_task = PythonOperator(
    task_id='first_task',
    provide_context=True,
    python_callable=first_task,
    dag=dag)
first_task_successor = DummyOperator(task_id='first_task_successor', dag=dag)
first_task_successor.set_upstream(first_task)


second_task = PythonOperator(
    task_id='second_task',
    provide_context=True,
    python_callable=second_task,
    dag=dag)
second_task_successor = DummyOperator(task_id='second_task_successor', dag=dag)
second_task_successor.set_upstream(second_task)


third_task = PythonOperator(
    task_id='third_task',
    provide_context=True,
    python_callable=third_task,
    dag=dag)
third_task_successor = DummyOperator(task_id='third_task_successor', dag=dag)
third_task_successor.set_upstream(third_task)


fourth_task = PythonOperator(
    task_id='fourth_task',
    provide_context=True,
    python_callable=fourth_task,
    dag=dag)
fourth_task_successor = DummyOperator(task_id='fourth_task_successor', dag=dag)
fourth_task_successor.set_upstream(fourth_task)


fifth_task = ShortCircuitOperator(
    task_id='fifth_task',
    provide_context=True,
    python_callable=fifth_task,
    dag=dag)
fifth_task_successor = DummyOperator(task_id='fifth_task_successor', dag=dag)
fifth_task_successor.set_upstream(fifth_task)

sixth_task = ShortCircuitOperator(
    task_id='sixth_task',
    provide_context=True,
    python_callable=sixth_task,
    dag=dag)
sixth_task_successor = DummyOperator(task_id='sixth_task_successor', dag=dag)
sixth_task_successor.set_upstream(sixth_task)

截屏: enter image description here

答案 1 :(得分:0)

@kaxil代码如下。

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import inspect
import urllib.request
import airflow
from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import timedelta

args = {
    'owner': 'airflow',
    'start_date': airflow.utils.dates.days_ago(1),
    'email': ['test@example.com'],
    'email_on_failure': True,
    'email_on_retry': False,
}

dag = DAG(
    dag_id='airflow_so',
    catchup=False,
    default_args=args,
    dagrun_timeout=timedelta(minutes=5),
    schedule_interval=timedelta(seconds=10)
)

def http_request_send_push(ds, **kwargs):
    endpoint='http://10.19.54.110:8080/v1/trigger-scheduled-push'
    try:
        response = urllib.request.urlopen(endpoint, timeout=10)
    except Exception as e:
        print('%s:%s:%s',
                 inspect.stack()[0][3],
                 type(e),
                 e)
    else:
        req = response.read()
        print('%s:%s:%s',
                 inspect.stack()[0][3],
                 type(req),
                 req)

    endpoint='http://10.19.54.110:8080/v1/trigger-scheduled-repush'
    try:
        response = urllib.request.urlopen(endpoint, timeout=10)
    except Exception as e:
        print('%s:%s:%s',
                 inspect.stack()[0][3],
                 type(e),
                 e)
    else:
        req = response.read()
        print('%s:%s:%s',
                 inspect.stack()[0][3],
                 type(req),
                 req)

http_request_send_push = PythonOperator(
    task_id='http_request_send_push',
    provide_context=True,
    python_callable=http_request_send_push,
    dag=dag
)


def http_request_send_sms(ds, **kwargs):
    endpoint='http://10.19.54.134:8080/v1/scheduleSendSms'
    try:
        response = urllib.request.urlopen(endpoint, timeout=10)
    except Exception as e:
        print('%s:%s:%s',
                 inspect.stack()[0][3],
                 type(e),
                 e)
    else:
        req = response.read()
        print('%s:%s:%s',
                 inspect.stack()[0][3],
                 type(req),
                 req)

    endpoint='http://10.19.54.134:8080/v1/scheduleReSendSms'
    try:
        response = urllib.request.urlopen(endpoint, timeout=10)
    except Exception as e:
        print('%s:%s:%s',
                 inspect.stack()[0][3],
                 type(e),
                 e)
    else:
        req = response.read()
        print('%s:%s:%s',
                 inspect.stack()[0][3],
                 type(req),
                 req)

http_request_send_sms = PythonOperator(
    task_id='http_request_send_sms',
    provide_context=True,
    python_callable=http_request_send_sms,
    dag=dag
)


def http_request_send_email(ds, **kwargs):
    endpoint='http://10.19.54.138:8080/v1/scheduleSendEmail'
    try:
        response = urllib.request.urlopen(endpoint, timeout=10)
    except Exception as e:
        print('%s:%s:%s',
                 inspect.stack()[0][3],
                 type(e),
                 e)
    else:
        req = response.read()
        print('%s:%s:%s',
                 inspect.stack()[0][3],
                 type(req),
                 req)

    endpoint='http://10.19.54.138:8080/v1/scheduleReSendEmail'
    try:
        response = urllib.request.urlopen(endpoint, timeout=10)
    except Exception as e:
        print('%s:%s:%s',
                 inspect.stack()[0][3],
                 type(e),
                 e)
    else:
        req = response.read()
        print('%s:%s:%s',
                 inspect.stack()[0][3],
                 type(req),
                 req)

http_request_send_email = PythonOperator(
    task_id='http_request_send_email',
    provide_context=True,
    python_callable=http_request_send_email,
    dag=dag
)

http_request_send_push >> http_request_send_sms >> http_request_send_email

if __name__ == "__main__":
    dag.cli()