重试任务可能没有意义。例如,如果任务是传感器,并且由于其凭据无效而失败,则将来的任何重试都将不可避免地失败。如何定义可以决定重试是否合理的运算符?
在Airflow 1.10.6中,用于决定是否应重试任务的逻辑位于>>> import networkx as nx
>>> import matplotlib.pyplot as plt
>>> G = nx.erdos_renyi_graph (100,0.02)
>>> nx.draw(G, node_color=range(100), node_size=800, cmap=plt.cm.Blues)
中,这使得无法定义操作员的行为,因为这是任务的责任,而不是任务的责任。运算符。
一个理想的情况是,如果airflow.models.taskinstance.TaskInstance.handle_failure
方法是在操作员端定义的,那么我们可以根据需要重新定义它。
我发现的唯一解决方法是使用handle_failure
“测试”任务是否可以运行。例如,对于上面的传感器,请检查登录凭据是否有效,然后将DAG流发送到传感器。否则,将失败(或转移到另一个任务)。
我对PythonBranchingOperator
的分析正确吗?有更好的解决方法吗?
答案 0 :(得分:1)
您可以从上下文中获取相应的任务实例,然后为其重新定义重试次数,例如:
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
default_args = {
"owner": "Airflow",
"start_date": datetime(2011, 1, 1, 1, 1),
}
def fun(*, task_instance, **context):
task_instance.max_tries = 0 # reset retries to 0
raise Exception()
with DAG("my_dag", default_args=default_args, catchup=False) as dag:
op = PythonOperator(
task_id="my_op",
python_callable=fun,
provide_context=True,
retries=100000, # set a lot of retries
retry_delay=timedelta(seconds=1),
)
我个人不会动态地重新定义重试次数,因为它以一种不明显的方式在操作员内部改变了工作流程的行为,因此使工作流程的推理变得复杂。我只是让任务按照设置的次数失败,而与失败原因无关。如果退休的费用昂贵,我会减少他们的人数(例如减少到1或0)。
答案 1 :(得分:1)
通过修改所有操作符中可用的self.retries
实例变量来回答我自己的问题,在execute
方法中,我们可以动态地强制不再重试。
在以下示例中:
from datetime import datetime, timedelta
from airflow import DAG
from airflow.models import BaseOperator
class PseudoSensor(BaseOperator):
def __init__(
self,
s3_status_code_mock,
*args,
**kwargs):
super().__init__(*args, **kwargs)
self.s3_status_code_mock = s3_status_code_mock
def execute(self, context):
# Try to read S3, Redshift, blah blah
pass
# The query returned a status code, that we mock when the Sensor is initialized
if self.s3_status_code_mock == 0:
# Success
return 0
elif self.s3_status_code_mock == 1:
# Error but should retry if I can still can
raise Exception("Retryable error. Won't change retries of operator.")
elif self.s3_status_code_mock == 2:
# Unrecoverable error. Should fail without future retries.
self.retries = 0
raise Exception("Unrecoverable error. Will set retries to 0.")
# A separate function so we don't make the globals dirty
def createDAG():
# Default (but overridable) arguments for Operators instantiations
default_args = {
'owner': 'Supay',
'depends_on_past': False,
'start_date': datetime(2019, 11, 28),
'retry_delay': timedelta(seconds=1),
'retries': 3,
}
with DAG("dynamic_retries_dag", default_args=default_args, schedule_interval=timedelta(days=1), catchup=False) as dag :
# Sensor 0: should succeed in first try
sensor_0 = PseudoSensor(
task_id="sensor_0",
provide_context=True,
s3_status_code_mock=0,
)
# Sensor 1: should fail after 3 tries
sensor_1 = PseudoSensor(
task_id="sensor_1",
provide_context=True,
s3_status_code_mock=1
)
# Sensor 1: should fail after 1 try
sensor_2 = PseudoSensor(
task_id="sensor_2",
provide_context=True,
s3_status_code_mock=2
)
dag >> sensor_0
dag >> sensor_1
dag >> sensor_2
globals()[dag.dag_id] = dag
# Run everything
createDAG()