当我的子任务超过约1000个任务时,Airflow开始爬行。我注意到问题似乎是气流反复尝试启动任务,但失败并显示错误“任务处于计划状态”,而不是像我期望的那样成功运行。这些任务在气流UI中显示为黄色,直到它们在以后的某个时间成功随机启动为止。我还没有尝试过这么大的普通(非子补)。
实际上没有什么可以阻止这些dag的执行。
这也可能与作业开始时达到最大并行度有关。我真的不知道在哪里看。
只有一组小工作,气流似乎运行良好。
我看到许多由气流启动并运行任务的过程
/usr/local/bin/airflow tasks run <subdag ID> <task id> <execution date> ...
这些任务应该可以正常运行,但是在它们的日志中,我看到以下内容(我编辑了任务名称):
cat /opt/airflow/logs/<subdag ID>/<task ID>/<execution date>/1.log
[2019-11-06 08:56:01,572] {taskinstance.py:618} INFO - Dependencies not met for <TaskInstance: <dag>.<subdag>.<task> 2019-11-06T03:25:02.889939+00:00 [scheduled]>, dependency 'Task Instance State' FAILED: Task is in the 'scheduled' state which is not a valid state for execution. The task must be cleared in order to be run.
[2019-11-06 08:56:01,578] {logging_mixin.py:89} INFO - [2019-11-06 08:56:01,578] {local_task_job.py:86} INFO - Task is not able to be run
[2019-11-06 15:33:31,196] {taskinstance.py:618} INFO - Dependencies not met for <TaskInstance: <dag>.<subdag>.<task> 2019-11-06T03:25:02.889939+00:00 [scheduled]>, dependency 'Task Instance State' FAILED: Task is in the 'scheduled' state which is not a valid state for execution. The task must be cleared in order to be run.
[2019-11-06 15:33:31,204] {logging_mixin.py:89} INFO - [2019-11-06 15:33:31,203] {local_task_job.py:86} INFO - Task is not able to be run
[2019-11-06 15:35:45,554] {taskinstance.py:618} INFO - Dependencies not met for <TaskInstance: <dag>.<subdag>.<task> 2019-11-06T03:25:02.889939+00:00 [scheduled]>, dependency 'Task Instance State' FAILED: Task is in the 'scheduled' state which is not a valid state for execution. The task must be cleared in order to be run.
[2019-11-06 15:35:45,562] {logging_mixin.py:89} INFO - [2019-11-06 15:35:45,562] {local_task_job.py:86} INFO - Task is not able to be run
[2019-11-06 15:36:53,001] {taskinstance.py:618} INFO - Dependencies not met for <TaskInstance: <dag>.<subdag>.<task> 2019-11-06T03:25:02.889939+00:00 [scheduled]>, dependency 'Task Instance State' FAILED: Task is in the 'scheduled' state which is not a valid state for execution. The task must be cleared in order to be run.
[2019-11-06 15:36:53,003] {logging_mixin.py:89} INFO - [2019-11-06 15:36:53,002] {local_task_job.py:86} INFO - Task is not able to be run
这会消耗大量的CPU,使这些任务像这样启动和退出。下班后,任务通常会完成。
更多细节:
我正在使用LocalExecutor
我尝试过的事情:
我尝试将调度程序线程(max_threads)调整为1,尝试将run_duration从-1更改为300,将dagbag_import_timeout增加到200,这比我的dag加载所需的时间更长(它们花费不到3秒),并且尝试了完全删除数据库并重新初始化
编辑:
我浏览了源代码并进行了更改,使气流平稳运行。不幸的是,它的确会导致气流无法正确处理已取消的任务-如果有排队的事情,通过此更改它仍将运行那些排队的任务。
diff --git a/airflow/jobs/scheduler_job.py b/airflow/jobs/scheduler_job.py
index a6b42bc..0c79f46 100644
--- a/airflow/jobs/scheduler_job.py
+++ b/airflow/jobs/scheduler_job.py
@@ -1097,7 +1097,7 @@ class SchedulerJob(BaseJob):
ignore_all_deps=False,
ignore_depends_on_past=False,
ignore_task_deps=False,
- ignore_ti_state=False,
+ ignore_ti_state=True,
pool=simple_task_instance.pool,
file_path=simple_dag.full_filepath,
pickle_id=simple_dag.pickle_id)