我们移至puckel / Airflow-1.10.2,以尝试解决在多个环境中表现不佳的问题。我们正在AWS ECS上的ECS Airflow 1.10.2上运行。有趣的是,CPU /内存从未超过80%。 Airflow metadb的使用率也非常低。
下面,我列出了我们正在使用的配置,DagBag解析时间以及仅在纯Python中运行cProfile
的{{1}}输出的详细执行时间。
我们的一些DAG从DagBag()
导入一个函数,该函数返回我们在12个DAG中使用的DAG。这些DAG中的大多数及其对应的subdags仅按小时运行,但是每10分钟运行1个DAG / 3 subdags。
create_subdag_functions.py
一些观察:
max_threads = 2
dag_dir_list_interval = 300
dag_concurrency = 16
worker_concurrency = 16
max_active_runs_per_dag = 16
parallelism = 32
executor = CeleryExecutor
也花费很长时间,即使禁用了示例DAG,它们也会运行。解析每个DAG的时间将跳来跳去。 airflow list_dags -r
函数进行概要分析时,我们发现DagBag()大部分时间都用在DagBag()
函数中,这可能是由于/ usr / local / airflow /中有50多个sql文件dags文件夹我尝试过的解决方案:
airflow.utils.dag_processing.list_py_paths
max_threads
min_file_process_interval
DagBag loading stats for /usr/local/airflow/dags
-------------------------------------------------------------------
Number of DAGs: 42
Total task number: 311
DagBag parsing time: 189.77048399999995
--------------------------------------------+--------------------+---------+----------+------------------------------------------------------------------------------------------------------------
--------------------------------------------+--------------------+---------+----------+------------------------------------------------------------------------------------------------------------
/dag1.py | 60.576728 | 1 | 21 | ['dag1']
/dag2.py | 55.092603999999994 | 1 | 28 | ['dag2']
/dag3.py | 47.997972000000004 | 1 | 17 | ['dag3']
/dag4.py | 22.99313 | 3 | 16 | ['dag4', 'dag4.subdag1', 'dag4.subdag2']
/dag5.py | 0.67 | 1 | 21 | ['dag5']
/dag6.py | 0.652114 | 1 | 9 | ['dag6']
/dag7.py | 0.45368 | 1 | 26 | ['dag7']
/dag8.py | 0.396908 | 5 | 40 | ['dag8', 'dag8.subdag1', 'dag8.subdag2', 'dag8.subdag3', 'dag8.subdag4']
/dag9.py | 0.242012 | 6 | 38 | ['dag9', 'dag9.subdag1', 'dag9.subdag2', 'dag9.subdag3', 'dag9.subdag4', 'dag9.subdag5']
/dag10.py | 0.134342 | 1 | 1 | ['dag10']
/dag11.py | 0.13325 | 2 | 8 | ['dag11', 'dag12.subdag1']
/dag12.py | 0.10562 | 1 | 6 | ['dag12']
/create_subdag_functions.py | 0.105292 | 0 | 0 | []
example_http_operator.py | 0.040636 | 1 | 6 | ['example_http_operator']
example_subdag_operator.py | 0.005328 | 3 | 15 | ['example_subdag_operator', 'example_subdag_operator.section-1', 'example_subdag_operator.section-2']
example_bash_operator.py | 0.004052 | 1 | 6 | ['example_bash_operator']
example_branch_operator.py | 0.003444 | 1 | 11 | ['example_branch_operator']
example_branch_python_dop_operator_3.py | 0.003418 | 1 | 3 | ['example_branch_dop_operator_v3']
example_passing_params_via_test_command.py | 0.003222 | 1 | 2 | ['example_passing_params_via_test_command']
example_skip_dag.py | 0.002386 | 1 | 8 | ['example_skip_dag']
example_trigger_controller_dag.py | 0.002386 | 1 | 1 | ['example_trigger_controller_dag']
example_short_circuit_operator.py | 0.002344 | 1 | 6 | ['example_short_circuit_operator']
example_python_operator.py | 0.002218 | 1 | 6 | ['example_python_operator']
example_latest_only.py | 0.002196 | 1 | 2 | ['latest_only']
example_latest_only_with_trigger.py | 0.001848 | 1 | 5 | ['latest_only_with_trigger']
example_xcom.py | 0.001722 | 1 | 3 | ['example_xcom']
docker_copy_data.py | 0.001718 | 0 | 0 | []
example_trigger_target_dag.py | 0.001704 | 1 | 2 | ['example_trigger_target_dag']
tutorial.py | 0.00165 | 1 | 3 | ['tutorial']
test_utils.py | 0.001376 | 1 | 1 | ['test_utils']
example_docker_operator.py | 0.00103 | 0 | 0 | []
subdags/subdag.py | 0.001016 | 0 | 0 | []
-------------------------------------------------------------------------------------------------------+--------------------+---------+----------+--------------------------------------------------
注意:为简便起见,从第二个输出中删除了示例DAG
-------------------------------------------------------------------
DagBag loading stats for /usr/local/airflow/dags
-------------------------------------------------------------------
Number of DAGs: 42
Total task number: 311
DagBag parsing time: 296.5826819999999
------------------------------+--------------------+---------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
file | duration | dag_num | task_num | dags
------------------------------+--------------------+---------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
/dag1.py | 74.819988 | 1 | 21 | ['dag1']
/dag3.py | 53.193430000000006 | 1 | 17 | ['dag3']
/dag8.py | 34.535742 | 5 | 40 | ['dag8', 'dag8.subdag1', 'dag8.subdag2', 'dag8.subdag3', 'dag8.subdag4']
/dag4.py | 21.543944000000003 | 6 | 38 | ['dag9', 'dag9.subdag1', 'dag9.subdag2', 'dag9.subdag3', 'dag9.subdag4', 'dag9.subdag5']
/dag5.py | 18.458316000000003 | 3 | 16 | ['dag4', 'dag4.subdag1', 'dag4.subdag2']
/create_subdag_functions.py | 14.652806000000002 | 0 | 0 | []
/dag7.py | 13.051984000000001 | 2 | 8 | ['dag11', 'dag11.subdag1']
/dag8.py | 10.02703 | 1 | 21 | ['dag5']
/dag9.py | 9.834226000000001 | 1 | 1 | ['dag10']
/dag10.py | 9.575258000000002 | 1 | 28 | ['dag2']
/dag11.py | 9.418897999999999 | 1 | 9 | ['dag6']
/dag12.py | 9.319210000000002 | 1 | 6 | ['dag12']
/dag13.py | 8.686964 | 1 | 26 | ['dag7']
的cProfile输出:
from airflow.models import DagBag; DagBag()
气流性能下降: