我需要在Airflow中实施等待任务。 等待时间大约是几个小时。
首先, TimeDeltaSensor不能正常工作。
SLEEP_MINUTES_1ST = 11
sleep_task_1 = TimeDeltaSensor(
task_id="sleep_for_11_min",
delta=timedelta(minutes=SLEEP_MINUTES_1ST),
)
每天的时间表,例如:
schedule_interval='30 06 * * *'
只需等到下一个时间表:
[2020-01-15 18:10:21,800] {time_delta_sensor.py:45} INFO - Checking if the time (2020-01-16 06:41:00+00:00) has come
这在代码中非常明显: https://github.com/apache/airflow/blob/master/airflow/sensors/time_delta_sensor.py#L43
(更不用说使用计划时的已知错误:无或@一次)
下一个尝试是使用TimeSensor这样的:
SLEEP_MINUTES_1ST = 11
sleep_task_1 = TimeSensor(
task_id="sleep_for_11_min",
provide_context=True,
target_time=(timezone.utcnow()+timedelta(minutes=SLEEP_MINUTES_1ST)).time(),
trigger_rule=TriggerRule.NONE_FAILED
)
这实际上工作得很好,但是在 poke 模式下,它在整个等待时间内需要一个工作人员。我收到了使用 rechedule 模式的建议,但只需添加:
mode='reschedule',
每次重新计划检查都会生成新的计划,并且永远不会像这样完成:
[2020-01-15 15:36:42,818] {time_sensor.py:39} INFO - Checking if the time (14:47:42.707565) has come
[2020-01-15 15:36:42,981] {taskinstance.py:1054} INFO - Rescheduling task, marking task as UP_FOR_RESCHEDULE
....
[2020-01-15 15:38:51,306] {time_sensor.py:39} INFO - Checking if the time (14:49:51.079783) has come
[2020-01-15 15:38:51,331] {taskinstance.py:1054} INFO - Rescheduling task, marking task as UP_FOR_RESCHEDULE
...
[2020-01-15 15:41:00,587] {time_sensor.py:39} INFO - Checking if the time (14:52:00.202168) has come
[2020-01-15 15:41:00,614] {taskinstance.py:1054} INFO - Rescheduling task, marking task as UP_FOR_RESCHEDULE
.....
(请注意,气流在此处的日志中混合了UTC和我的时区UTC + 1)
下一个尝试是相对于DAG的执行日期为TimeSensor生成target_time。 但是几次尝试都没有成功,例如:
task_target_time = '{{ macros.datetime.fromtimestamp((execution_date + macros.timedelta(minutes=SLEEP_MINUTES_1ST).timestamp()) }}'
sleep_task_1 = TimeSensor(
task_id=task_id="sleep_for_11_min",
provide_context=True,
# target_time=(timezone.utcnow()+timedelta(minutes=SLEEP_MINUTES_1ST)).time(),
# target_time = task_target_time,
# target_time=datetime.strptime('{{ execution_date + macros.timedelta(minutes=SLEEP_MINUTES_1ST) }}','%Y-%m-%dT%H:%M:%S'),
# target_time='{{ execution_date }}'+ timedelta(minutes=SLEEP_MINUTES_1ST),
target_time = ('{{ task_instance.execution_date }}' + timedelta(minutes=SLEEP_MINUTES_1ST)).time(),
poke_interval=120,
mode='reschedule',
timeout=10*60*60,
trigger_rule=TriggerRule.NONE_FAILED
)
在带注释的行(target_time ....)中,您只能看到我尝试过的某些组合。 有些在创建DAG时立即失败,有些在运行期间生成了这样的错误:
[2020-01-15 17:56:39,388] {time_sensor.py:39} INFO - Checking if the time ({{ macros.datetime.fromtimestamp((execution_date + macros.timedelta(minutes=SLEEP_MINUTES_1ST).timestamp()) }}) has come
[2020-01-15 17:56:39,389] {taskinstance.py:1058} ERROR - '>' not supported between instances of 'datetime.time' and 'str'
Traceback (most recent call last):
File "/data/airflow_ak/.direnv/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 930, in _run_raw_task
result = task_copy.execute(context=context)
File "/data/airflow_ak/.direnv/lib/python3.6/site-packages/airflow/sensors/base_sensor_operator.py", line 107, in execute
while not self.poke(context):
File "/data/airflow_ak/.direnv/lib/python3.6/site-packages/airflow/sensors/time_sensor.py", line 40, in poke
return timezone.utcnow().time() > self.target_time
TypeError: '>' not supported between instances of 'datetime.time' and 'str'
[2020-01-15 17:56:39,390] {taskinstance.py:1089} INFO - Marking task as FAILED.
我认为我理解了整个理论-包括执行日期在内的任务上下文在操作员创建时不可用,仅在运行时可用。 Jinja返回应该转换为时间的Pendulum对象,但是Jinja是String,在创建时我没有Pendulum方法。
但是为什么要创建简单的东西这么难?
sleep 1000
在气流中。
(气流:v1.10.6,python 3.6.8)
答案 0 :(得分:2)
这里是气流传感器“正在休眠”,因为我认为TimeDeltaSensor应该处于休眠状态。
最好在“重新安排”模式下使用。
它相对于作为任务实例开始的当前时间休眠,例如TimeSleepSensor运算符,默认情况下,它仅在睡眠持续时间之后“戳”一次,并且具有默认超时,如果发生某些事件导致戳操作失败,则默认超时将在请求sleep_duration之后立即将其超时。
from airflow.sensors.base_sensor_operator import BaseSensorOperator
from airflow.utils import timezone
from airflow.utils.decorators import apply_defaults
from datetime import datetime, timedelta
class TimeSleepSensor(BaseSensorOperator):
"""
Waits for specified time interval relative to task instance start
:param sleep_duration: time after which the job succeeds
:type sleep_duration: datetime.timedelta
"""
@apply_defaults
def __init__(self, sleep_duration, *args, **kwargs):
super(TimeSleepSensor, self).__init__(*args, **kwargs)
self.sleep_duration = sleep_duration
self.poke_interval = kwargs.get('poke_interval',int(sleep_duration.total_seconds()))
self.timeout = kwargs.get('timeout',int(sleep_duration.total_seconds()) + 30)
def poke(self, context):
ti = context["ti"]
sensor_task_start_date = ti.start_date
target_time = sensor_task_start_date + self.sleep_duration
self.log.info("Checking if the target time ({} - check:{}) has come - time to go: {}, start: {}, initial sleep_duration: {}"
.format(target_time, (timezone.utcnow() > target_time), (target_time-timezone.utcnow()), sensor_task_start_date, self.sleep_duration)
)
return timezone.utcnow() > target_time
用法很简单:
sleep_task = TimeSleepSensor(
task_id="sleep_task",
sleep_duration=timedelta(minutes=1800),
mode='reschedule'
)
答案 1 :(得分:1)
TimeSensor
进入重新计划循环,因为target_time
在每次将约束检查为不同值时都会重新计算。这导致约束永远无法实现。
target_time=(timezone.utcnow()+timedelta(minutes=SLEEP_MINUTES_1ST)).time(),
以这种方式使用TimeSensor
时,必须将target_time
设置为一个时间值,该时间值是您期望满足条件的最新时间。
我建议在TimeDeltaSensor
模式下使用reschedule
。可以等待任务被安排好,如果满足约束检查或以其他方式执行,则可以重新安排它。
SLEEP_MINUTES_1ST = 11
sleep_task_1 = TimeDeltaSensor(
task_id="sleep_for_11_min",
delta=timedelta(minutes=SLEEP_MINUTES_1ST),
mode='reschedule'
)
您还可以类似于BaseSensorOperator
的子类TimeSensor
进行生动检查,以查看任务是否已从睡眠中释放出来。例如,
from airflow.sensors.base_sensor_operator import BaseSensorOperator
from airflow.utils.decorators import apply_defaults
from airflow.models.taskreschedule import TaskReschedule
from airflow.utils.session import provide_session
XCOM_KEY='start_date'
class ReleaseProbe(BaseSensorOperator):
"""
Waits until the time of job is released from sleep.
:param sleep_duration: sleep duration of job before it runs
:type delta: datetime.timedelta
"""
@apply_defaults
def __init__(self, sleep_duration, *args, **kwargs):
super(ReleaseProbe, self).__init__(*args, **kwargs)
self.sleep_duration = sleep_duration
def poke(self, context):
self.log.info('Checking if task is released after (%s) sleep, execution date is: %s', self.sleep_duration)
ti = context['ti']
start_date = ti.xcom_pull(key=XCOM_KEY, task_id=ti.task_id)
if not start_date:
ti.xcom_push(key=XCOM_KEY, value=timezone.now())
return False
return timezone.utcnow() - start_date > self.sleep_duration
答案 2 :(得分:0)
嗯,这并非完全解决您的问题,而是一种替代的(经过测试的)方式。
您可以做的只是创建一个bash运算符,然后调用redHotel()
。我相信它只会占用一个线程,就像终端上的LuxuryHotel
命令一样。
sleep
这样,您就可以通过最简单的方式来实现功能,而无需使用任何复杂的运算符。