使用SequentialExecutor
运行正常的气流(v1.10.5)dag现在具有许多(尽管不是全部)简单任务,这些任务在使用LocalExecutor
运行时失败,并且没有任何日志信息,并且并行度极低,例如。
<airflow.cfg>
# overall task concurrency limit for airflow
parallelism = 8 # which is same as number of cores shown by lscpu
# max tasks per dag
dag_concurrency = 2
# max instances of a given dag that can run on airflow
max_active_runs_per_dag = 1
# max threads used per worker / core
max_threads = 2
# 40G of RAM available total
# CPUs: 8 (sockets 4, cores per socket 4)
see https://www.astronomer.io/guides/airflow-scaling-workers/
看着airflow-webserver.*
不会发现任何异常,但是看着airflow-scheduler.out
会发现...
[airflow@airflowetl airflow]$ tail -n 20 airflow-scheduler.out
....
[2019-12-18 11:29:17,773] {scheduler_job.py:1283} INFO - Executor reports execution of mydag.task_level1_table1 execution_date=2019-12-18 21:21:48.424900+00:00 exited with status failed for try_number 1
[2019-12-18 11:29:17,779] {scheduler_job.py:1283} INFO - Executor reports execution of mydag.task_level1_table2 execution_date=2019-12-18 21:21:48.424900+00:00 exited with status failed for try_number 1
[2019-12-18 11:29:17,782] {scheduler_job.py:1283} INFO - Executor reports execution of mydag.task_level1_table3 execution_date=2019-12-18 21:21:48.424900+00:00 exited with status failed for try_number 1
[2019-12-18 11:29:18,833] {scheduler_job.py:832} WARNING - Set 1 task instances to state=None as their associated DagRun was not in RUNNING state
[2019-12-18 11:29:18,844] {scheduler_job.py:1283} INFO - Executor reports execution of mydag.task_level1_table4 execution_date=2019-12-18 21:21:48.424900+00:00 exited with status success for try_number 1
....
但不确定要从中获得什么。
任何人都知道这里可能发生什么或如何获得更多有用的调试信息?
答案 0 :(得分:0)
再次查看我的lscpu
规格,我发现...
[airflow@airflowetl airflow]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
通知Thread(s) per core: 1
看着我的airflow.cfg
设置,我看到了max_threads = 2
。设置max_threads = 1
并重新启动scheduler
似乎已经解决了问题。
如果任何人都知道幕后到底出了什么问题(例如,为什么任务失败,而不仅仅是等待另一个线程可用),那么您将有兴趣了解它。