气流DAG并行任务延迟/执行延迟60秒

时间:2019-03-09 11:28:25

标签: airflow airflow-scheduler

我们已移至 AirFlow 1.10.2 以解决CPU使用问题,好在我们的问题已在我们的环境中得到解决。但是,我们已经观察到,尽管DAG的任务正在提交并显示在AirFlow仪表板上运行,但是它们有点滞后于实际处理,然后在实际执行之后似乎在队列中保留了大约60秒。请注意,对于我们的用例实现

  • AirFlow DAG与时间无关,即它们不是'**预定DAG '**,而是通过python代码触发的。
  • AirFlow v1.10.2被用作单个独立安装[executor = LocalExecutor]。

python代码监视目录中是否有到达的任何文件。它观察到,对于任何文件,代码都会触发AirFlow DAG。我们会收到大量的文件包,因此在任何给定的实例中,都存在调用同一DAG的多个实例的情况[下面提供的代码段]。触发DAG,然后执行一个任务,该任务调用python代码以触发Kubernetes容器,其中发生一些与文件相关的处理。请在下面找到DAG代码的摘录

positional_to_ascii = BashOperator(
                    task_id="uncompress_the_file",
                    bash_command='python3.6 ' + os.path.join(cons.CODE_REPO, 'app/Code/k8Job/create_kubernetes_job.py') + ' POS-PREPROCESSING {{ dag_run.conf["inputfilepath"] }} {{ dag_run.conf["frt_id"]}}',
                    execution_timeout=None,
                    dag=dag)

此任务完成后,将触发另一个DAG,该DAG的任务是处理前一个DAG的输出中的数据。

请在下面找到我们配置文件参数的一些详细信息,这些参数可能有助于评估根本原因。

min_file_process_interval = 60 
dag_dir_list_interval = 300 
max_threads = 2
dag_concurrency = 16
worker_concurrency = 16
max_active_runs_per_dag = 16
parallelism = 32
sql_alchemy_conn = mysql://airflow:fewfw324$gG@someXserver:3306/airflow
executor = LocalExecutor

DagBag解析时间:1.305286。还请在下面找到命令airflow list_dags -r的快照

-------------------------------------------------------------------
DagBag loading stats for /root/airflow/dags
-------------------------------------------------------------------
Number of DAGs: 7
Total task number: 23
DagBag parsing time: 1.305286
------------------------------+----------+---------+----------+------------------------------
file                          | duration | dag_num | task_num | dags
------------------------------+----------+---------+----------+------------------------------
/trigger_cleansing.py         | 0.876388 |       1 |        5 | ['trigger_cleansing']
/processing_ebcdic_trigger.py | 0.383038 |       1 |        1 | ['processing_ebcdic_trigger']
/prep_preprocess_dag.py       | 0.015474 |       1 |        6 | ['prep_preprocess_dag']
/prep_scale_dag.py            | 0.012098 |       1 |        6 | ['dataprep_scale_dag']
/mvp.py                       | 0.010832 |       1 |        2 | ['dg_a']
/prep_uncompress_dag.py       | 0.004142 |       1 |        2 | ['dataprep_unzip_decrypt_dag']
/prep_positional_trigger.py   | 0.003314 |       1 |        1 | ['prep_positional_trigger']
------------------------------+----------+---------+----------+------------------------------

下面是气流计划程序服务的状态,它显示多个过程

systemctl status airflow-scheduler
● airflow-scheduler.service - Airflow scheduler daemon
   Loaded: loaded (/etc/systemd/system/airflow-scheduler.service; enabled; vendor preset: disabled)
   Active: active (running) since Sat 2019-03-09 04:44:29 EST; 33min ago
 Main PID: 37409 (airflow)
   CGroup: /system.slice/airflow-scheduler.service
           ├─37409 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37684 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37685 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37686 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37687 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37688 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37689 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37690 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37691 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37692 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37693 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37694 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37695 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37696 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37697 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37699 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37700 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37701 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37702 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37703 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37704 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37705 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37706 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37707 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37708 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37709 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37710 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37712 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37713 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37714 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37715 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37717 /usr/bin/python3.6 /bin/airflow scheduler
           ├─37718 /usr/bin/python3.6 /bin/airflow scheduler
           └─37722 /usr/bin/python3.6 /bin/airflow scheduler

现在我们不断有DAG中包含多个文件的事实不断被触发,并且有足够的DAG任务进入等待阶段。奇怪的是,尽管在使用v1.9时我们没有遇到这个问题,请告知。

1 个答案:

答案 0 :(得分:0)

我意识到在' airflow.cfg '文件中,'min_file_process_interval'的值为60。将其设置为零可以解决我在此处报告的问题。