如何增加每秒排队的任务?

时间:2018-02-01 16:52:29

标签: airflow airflow-scheduler

我正在尝试诊断性能不佳的气流管道,并且我想知道我应该从气流调度程序中获得什么样的性能,类似于“每秒安排的任务”。

我有几个排队的工作,我的许多任务在几秒钟内完成,所以我怀疑调度程序是限制组件,我有很多快速任务的错误。不过,如果可以避免,我宁愿不重写我的DAG。

如何提高调度程序排队任务的速度?

管道详细信息

Here is what my current airflow.cfg looks like.

我只有两个dags跑。一个是每5分钟安排一个,另一个很少由第一个触发。我目前正试图以这个频率回填几年,但可能需要改变我的方法:

enter image description here

至于工作节点:我目前有4个相当强大的服务器,在磁盘,网络,CPU,RAM,交换中的资源使用率低于10%。关闭3个工作人员对我的任务吞吐量没有影响,服务器几乎没有记录工作量的变化。

1 个答案:

答案 0 :(得分:7)

There are a number of config values in your airflow.cfg that could be related to this.

Under [core]:

  • parallelism: Total number of task instances that can run at once.
  • dag_concurrency: Limit of task instances that can run per DAG run, may need to bump if you have many parallel tasks. Can override when defining a DAG.
  • non_pooled_task_slot_count: Limit of tasks without a pool configured that can run at once.
  • max_active_dag_runs_per_dag: If you're triggering runs manually or there's a backup of DAG runs scheduled with a short interval. Can override when defining a DAG.

Under [scheduler]:

Under [worker]:

  • celeryd_concurrency: Number of workers celery will run with, so essentially number of task instances a worker can take at once. Matching the number of CPUs is a popular starting point, but can definitely go higher.

Last one is only if you're using the CeleryExecutor, which I'd definitely recommend if you're looking to increase your task throughput.