气流反复重试带有错误任务的任务处于“计划”状态,这不是有效的执行状态

时间:2019-11-06 19:09:15

标签: airflow airflow-scheduler

当我的子任务超过约1000个任务时,Airflow开始爬行。我注意到问题似乎是气流反复尝试启动任务,但失败并显示错误“任务处于计划状态”,而不是像我期望的那样成功运行。这些任务在气流UI中显示为黄色,直到它们在以后的某个时间成功随机启动为止。我还没有尝试过这么大的普通(非子补)。

实际上没有什么可以阻止这些dag的执行。

这也可能与作业开始时达到最大并行度有关。我真的不知道在哪里看。

只有一组小工作,气流似乎运行良好。

我看到许多由气流启动并运行任务的过程

/usr/local/bin/airflow tasks run <subdag ID> <task id> <execution date> ...

这些任务应该可以正常运行,但是在它们的日志中,我看到以下内容(我编辑了任务名称):

cat /opt/airflow/logs/<subdag ID>/<task ID>/<execution date>/1.log
[2019-11-06 08:56:01,572] {taskinstance.py:618} INFO - Dependencies not met for <TaskInstance: <dag>.<subdag>.<task> 2019-11-06T03:25:02.889939+00:00 [scheduled]>, dependency 'Task Instance State' FAILED: Task is in the 'scheduled' state which is not a valid state for execution. The task must be cleared in order to be run.
[2019-11-06 08:56:01,578] {logging_mixin.py:89} INFO - [2019-11-06 08:56:01,578] {local_task_job.py:86} INFO - Task is not able to be run
[2019-11-06 15:33:31,196] {taskinstance.py:618} INFO - Dependencies not met for <TaskInstance: <dag>.<subdag>.<task> 2019-11-06T03:25:02.889939+00:00 [scheduled]>, dependency 'Task Instance State' FAILED: Task is in the 'scheduled' state which is not a valid state for execution. The task must be cleared in order to be run.
[2019-11-06 15:33:31,204] {logging_mixin.py:89} INFO - [2019-11-06 15:33:31,203] {local_task_job.py:86} INFO - Task is not able to be run
[2019-11-06 15:35:45,554] {taskinstance.py:618} INFO - Dependencies not met for <TaskInstance: <dag>.<subdag>.<task> 2019-11-06T03:25:02.889939+00:00 [scheduled]>, dependency 'Task Instance State' FAILED: Task is in the 'scheduled' state which is not a valid state for execution. The task must be cleared in order to be run.
[2019-11-06 15:35:45,562] {logging_mixin.py:89} INFO - [2019-11-06 15:35:45,562] {local_task_job.py:86} INFO - Task is not able to be run
[2019-11-06 15:36:53,001] {taskinstance.py:618} INFO - Dependencies not met for <TaskInstance: <dag>.<subdag>.<task> 2019-11-06T03:25:02.889939+00:00 [scheduled]>, dependency 'Task Instance State' FAILED: Task is in the 'scheduled' state which is not a valid state for execution. The task must be cleared in order to be run.
[2019-11-06 15:36:53,003] {logging_mixin.py:89} INFO - [2019-11-06 15:36:53,002] {local_task_job.py:86} INFO - Task is not able to be run

这会消耗大量的CPU,使这些任务像这样启动和退出。下班后,任务通常会完成。

更多细节:

我正在使用LocalExecutor

我尝试过的事情:

我尝试将调度程序线程(max_threads)调整为1,尝试将run_duration从-1更改为300,将dagbag_import_timeout增加到200,这比我的dag加载所需的时间更长(它们花费不到3秒),并且尝试了完全删除数据库并重新初始化

编辑:

我浏览了源代码并进行了更改,使气流平稳运行。不幸的是,它的确会导致气流无法正确处理已取消的任务-如果有排队的事情,通过此更改它仍将运行那些排队的任务。

diff --git a/airflow/jobs/scheduler_job.py b/airflow/jobs/scheduler_job.py
index a6b42bc..0c79f46 100644
--- a/airflow/jobs/scheduler_job.py
+++ b/airflow/jobs/scheduler_job.py
@@ -1097,7 +1097,7 @@ class SchedulerJob(BaseJob):
                 ignore_all_deps=False,
                 ignore_depends_on_past=False,
                 ignore_task_deps=False,
-                ignore_ti_state=False,
+                ignore_ti_state=True,
                 pool=simple_task_instance.pool,
                 file_path=simple_dag.full_filepath,
                 pickle_id=simple_dag.pickle_id)

0 个答案:

没有答案