气流:芹菜工人的MySQL连接过多

时间:2019-08-29 23:05:42

标签: mysql celery airflow

我们正在用Celery运行Airflow 1.10.1。面对多个打开的连接。 DAG启动时-UI挂起了几分钟。

要点:

  • 所有节点都是BareMetal:CPU:40,MHz 2494.015,RAM 378G,10Gb NIC-
  • 数据库连接未重新使用
  • 只有5个处于活动状态时,连接才会保持打开状态
  • 工作人员创建数百个保持打开状态的连接,直到数据库清除它们(900秒)
  • 每个工人运行100个芹菜线程

MySQL>显示“ Thread%”之类的全局状态; <​​/ p>

+-------------------------+---------     + 
| Variable_name           | Value         |
+-------------------------+---------      +
| Thread pool_idle_threads | 0            |
| Thread pool_threads      | 0            |
| Threads_cached          | 775           |
| Threads_connected       | 5323          |
| Threads_created         | 4846609       |
| Threads_running         | 5             |
+-------------------------+---------      +

MySQL连接:

31  - worker1
215 - worker2
349 - worker53
335 - worker54
347 - worker55
336 - worker56
336 - worker57
354 - worker58
339 - worker59
328 - worker60
333 - worker61
337 - worker62
2   - scheduler

Worker .cfg

[core]
sql_alchemy_pool_size = 5
sql_alchemy_pool_recycle = 900
sql_alchemy_reconnect_timeout = 300
parallelism = 1200
dag_concurrency = 800
non_pooled_task_slot_count = 1200
max_active_runs_per_dag = 10
dagbag_import_timeout = 30
[celery]
worker_concurrency = 100

计划程序.cfg:

   [core]
    sql_alchemy_pool_size = 30
    sql_alchemy_pool_recycle = 300
    sql_alchemy_reconnect_timeout = 300
    parallelism = 1200
    dag_concurrency = 800
    non_pooled_task_slot_count = 1200
    max_active_runs_per_dag = 10
    [scheduler]
    job_heartbeat_sec = 5
    scheduler_heartbeat_sec = 5
    run_duration = 1800
    min_file_process_interval = 10
    min_file_parsing_loop_time = 1
    dag_dir_list_interval = 300
    print_stats_interval = 30
    scheduler_zombie_task_threshold = 300
    max_tis_per_query = 1024
    max_threads = 29

要添加,我正在运行1000个简单的任务,例如sleepls

1 个答案:

答案 0 :(得分:0)

我们能够将连接从700-800断开到1-10

您可以做两件事:

  1. 设置sql_alchemy_pool_enabled = False
  2. 设置与数据库不同的result_backend,在本例中,我们将redis用作result_backend,将MySQL用作主数据库