我发现气流重试不一致,或者可能缺少某些东西。基本上,我在默认参数中将retries
设置为3;
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2015, 6, 1),
'email': ['airflow@example.com'],
'email_on_failure': True,
'email_on_retry': True,
'retries': 3
}
,并在3
中设置为PythonOperators
。
t1 = PythonOperator(
task_id='task_1',
python_callable=task_1_process,
retries=3,
dag=dag)
t2 = SparkSubmitOperator(
task_id="spark_submit_task",
conn_id="spark_default",
java_class="${JAVA_CLASS}",
application='/path/to/myjar-0.1.jar',
application_args=["${ARG1}"],
conf=SPARK_CONF,
dag=dag)
t3 = PythonOperator(
task_id='task_3',
python_callable=task_3_process,
retries=3,
dag=dag)
我认为这意味着如果第一次运行失败,该作业将被重试3次,这意味着我应该具有:
Try 1 out of 4 [up_for_retry]
Try 2 out of 4 [up_for_retry]
Try 3 out of 4 [up_for_retry]
Try 4 out of 4 [failed]
但是在电子邮件中,我收到:
Airflow alert: <TaskInstance: MyJob.spark_submit_task 2018-12-06T00:00:00+00:00 [up_for_retry]>
Try 2 out of 4
Try 3 out of 4
Try 4 out of 4
Airflow alert: <TaskInstance: Airflow alert: <TaskInstance: MyJob.spark_submit_task 2018-12-06T00:00:00+00:00 [failed]>
Try 5 out of 4
,即3次重试和1次重试,之后其标记失败。那么,这些重试背后的逻辑是什么,或者如何解释为Try 5 out of 4
听起来很不合逻辑?