我通过尝试许多示例来非常熟悉airflow的编程功能。.使我无法进一步挖掘的是,它如何在不使CPU或RAM过载的情况下执行其工作,是否有一种方法来控制负载,以便不会耗尽资源
我知道一种方法,可以通过将以下字段min_file_process_interval和scheduler_heartbeat_sec的值更改为一分钟左右的间隔来减少调度程序“更频繁地调度和挑选文件”的工作量。尽管它减少了持续不断的CPU占用,但是当间隔过去后(即一分钟后),它突然又回到了启动时的95%的CPU占用率。至少不会消耗超过70%的CPU吗?
已编辑
:此外,当scheduler_heartbeat间隔过去时,我看到我所有的python脚本再次执行..这是这样工作的吗?我以为它将在间隔之后获取新的DAG,否则将无济于事。
答案 0 :(得分:3)
There are a few techniques you can use to control the number of processes running on airflow.
concurrency
and max_active_runs
which are defined when you initialize a DAGCeleryExecutor
. You can have the CeleryExecutor
execute on remote machines.[Didn't try this myself, but I have heard success stories with this.]These are the ones I have used. You'll have to be smart about the allocation to control CPU spikes and memory issues.