Airflow DAG:任何任务失败时的自定义电子邮件

时间:2018-08-07 11:58:22

标签: airflow

在DAG中是否有任何选项“自定义电子邮件并发送任务失败”。有一个类似'email_on_failure'的选项:True,但这不提供将内容动态添加到电子邮件主题或正文的选项。

我的DAG如下所示

import airflow

from airflow import DAG
from airflow.contrib.operators.databricks_operator import DatabricksSubmitRunOperator
from airflow.operators.email_operator import EmailOperator
from airflow.operators.bash_operator import BashOperator
from airflow.operators.http_operator import SimpleHttpOperator
from airflow.operators.sensors import HttpSensor
import json
from datetime import timedelta
from datetime import datetime
from airflow.models import Variable

args = {
    'owner': 'airflow',
    'email': ['test@gmail.com'],
    'email_on_failure': True,
    'email_on_retry': True,
    'depends_on_past': False,
    'start_date': airflow.utils.dates.days_ago(0),
    'max_active_runs':10
}

dag = DAG(dag_id='TEST_DAG', default_args=args, schedule_interval='@once')

new_cluster = {
    'spark_version': '4.0.x-scala2.11',
    'node_type_id': 'Standard_D16s_v3',
    'num_workers': 3,
    'spark_conf':{
        'spark.hadoop.javax.jdo.option.ConnectionDriverName':'org.postgresql.Driver',
        .....
    },
    'custom_tags':{
        'ApplicationName':'TEST',
        .....
    }
}

t1 = DatabricksSubmitRunOperator(
  task_id='t1',
  dag=dag,
  new_cluster=new_cluster,
  ......
)

t2 = SimpleHttpOperator(
    task_id='t2',
    method='POST',
    ........    
)

t2.set_upstream(t1)

t3 = SimpleHttpOperator(
    task_id='t3',
    method='POST',
   .....
 )

t3.set_upstream(t2)

send_mail = EmailOperator (
    dag=dag,
    task_id="send_mail",
    to=["test@gmail.com"],
    subject=" Success",
    html_content='<h3>Success</h3>')

send_mail.set_upstream(t3)

成功案例send_mail任务会将定制的电子邮件发送到指定的电子邮件ID。

但是如果万一任务失败,我想自定义电子邮件并发送到指定的电子邮件ID。但这是没有发生的,在失败的情况下,使用默认主题和正文发送电子邮件

任何帮助将不胜感激

3 个答案:

答案 0 :(得分:2)

我为此使用on_failure_callback。请注意,它将在DAG中的每个失败任务中触发。

def report_failure(context):
    # include this check if you only want to get one email per DAG
    if(task_instance.xcom_pull(task_ids=None, dag_id=dag_id, key=dag_id) == True):
        logging.info("Other failing task has been notified.")
    send_email = EmailOperator(...)
    send_email.execute(context)

'''

dag = DAG(
    ...,
    default_args={
        ...,
        "on_failure_callback": report_failure
    }
)

答案 1 :(得分:2)

我借助Airflow TriggerRule(以下示例DAG)进行了管理:-

import airflow

from airflow import DAG
from airflow.contrib.operators.databricks_operator import DatabricksSubmitRunOperator
from airflow.operators.email_operator import EmailOperator
from airflow.operators.bash_operator import BashOperator
from airflow.operators.http_operator import SimpleHttpOperator
from airflow.operators.sensors import HttpSensor
import json
from datetime import timedelta
from datetime import datetime
from airflow.models import Variable
from airflow.utils.trigger_rule import TriggerRule

args = {
    'owner': 'airflow',
    'email': ['test@gmail.com'],
    'email_on_failure': True,
    'email_on_retry': True,
    'depends_on_past': False,
    'start_date': airflow.utils.dates.days_ago(0),
    'max_active_runs':10
}

dag = DAG(dag_id='TEST_DAG', default_args=args, schedule_interval='@once')

new_cluster = {
    'spark_version': '4.0.x-scala2.11',
    'node_type_id': 'Standard_D16s_v3',
    'num_workers': 3,
    'spark_conf':{
        'spark.hadoop.javax.jdo.option.ConnectionDriverName':'org.postgresql.Driver',
        .....
    },
    'custom_tags':{
        'ApplicationName':'TEST',
        .....
    }
}

t1 = DatabricksSubmitRunOperator(
  task_id='t1',
  dag=dag,
  new_cluster=new_cluster,
  ......
)

t2 = SimpleHttpOperator(
    task_id='t2',
    trigger_rule=TriggerRule.ONE_SUCCESS,
    method='POST',
    ........    
)

t2.set_upstream(t1)

t3 = SimpleHttpOperator(
    task_id='t3',
    trigger_rule=TriggerRule.ONE_SUCCESS,
    method='POST',
   .....
 )

t3.set_upstream(t2)

AllTaskSuccess = EmailOperator (
    dag=dag,
    trigger_rule=TriggerRule.ALL_SUCCESS,
    task_id="AllTaskSuccess",
    to=["test@gmail.com"],
    subject="All Task completed successfully",
    html_content='<h3>All Task completed successfully" </h3>')

AllTaskSuccess.set_upstream([t1, t2,t3])

t1Failed = EmailOperator (
    dag=dag,
    trigger_rule=TriggerRule.ONE_FAILED,
    task_id="t1Failed",
    to=["test@gmail.com"],
    subject="T1 Failed",
    html_content='<h3>T1 Failed</h3>')

t1Failed.set_upstream([t1])

t2Failed = EmailOperator (
    dag=dag,
    trigger_rule=TriggerRule.ONE_FAILED,
    task_id="t2Failed",
    to=["test@gmail.com"],
    subject="T2 Failed",
    html_content='<h3>T2 Failed</h3>')

t2Failed.set_upstream([t2])

t3Failed = EmailOperator (
    dag=dag,
    trigger_rule=TriggerRule.ONE_FAILED,
    task_id="t3Failed",
    to=["test@gmail.com"],
    subject="T3 Failed",
    html_content='<h3>T3 Failed</h3>')

t3Failed.set_upstream([t3])

触发规则

尽管正常的工作流程行为是在所有直接上游任务都成功后触发任务,但是Airflow允许进行更复杂的依赖项设置。

所有运算符都有一个trigger_rule参数,该参数定义触发生成的任务所依据的规则。 trigger_rule的默认值为all_success,可以定义为“当所有直接上游任务都成功时触发此任务”。这里描述的所有其他规则都是基于直接父任务,并且是在创建任务时可以传递给任何运算符的值:

所有成功:(默认)所有父母都成功

all_failed:所有父母处于失败或上游失败状态

all_done:所有父母都已执行死刑

one_failed:至少有一位父母发生故障时立即触发,它不会等待所有父母完成工作

one_success:至少有一位父母成功后就触发,它不会等待所有父母都完成

虚拟:依赖只是为了显示,可以随意触发

参考:https://airflow.apache.org/concepts.html

答案 2 :(得分:0)

当前使用的是Airflow 1.10.1:

使用以下jinja模板,似乎可以在airflow.cfg中的“电子邮件”部分下配置自定义电子邮件选项:

[email]

email_backend = airflow.utils.email.send_email_smtp


subject_template = /path/to/my_subject_template_file


html_content_template = /path/to/my_html_content_template_file

可以通过使用html_content_template中的任务实例信息来创建自定义消息,而html_content_template又是一个Jinja模板

更多详细信息,请访问https://airflow.apache.org/docs/stable/howto/email-config.html