我们已移至 AirFlow 1.10.2 以解决CPU使用问题,好在我们的问题已在我们的环境中得到解决。但是,我们已经观察到,尽管DAG的任务正在提交并显示在AirFlow仪表板上运行,但是它们有点滞后于实际处理,然后在实际执行之后似乎在队列中保留了大约60秒。请注意,对于我们的用例实现
executor = LocalExecutor
]。python代码监视目录中是否有到达的任何文件。它观察到,对于任何文件,代码都会触发AirFlow DAG。我们会收到大量的文件包,因此在任何给定的实例中,都存在调用同一DAG的多个实例的情况[下面提供的代码段]。触发DAG,然后执行一个任务,该任务调用python代码以触发Kubernetes容器,其中发生一些与文件相关的处理。请在下面找到DAG代码的摘录
positional_to_ascii = BashOperator(
task_id="uncompress_the_file",
bash_command='python3.6 ' + os.path.join(cons.CODE_REPO, 'app/Code/k8Job/create_kubernetes_job.py') + ' POS-PREPROCESSING {{ dag_run.conf["inputfilepath"] }} {{ dag_run.conf["frt_id"]}}',
execution_timeout=None,
dag=dag)
此任务完成后,将触发另一个DAG,该DAG的任务是处理前一个DAG的输出中的数据。
请在下面找到我们配置文件参数的一些详细信息,这些参数可能有助于评估根本原因。
min_file_process_interval = 60
dag_dir_list_interval = 300
max_threads = 2
dag_concurrency = 16
worker_concurrency = 16
max_active_runs_per_dag = 16
parallelism = 32
sql_alchemy_conn = mysql://airflow:fewfw324$gG@someXserver:3306/airflow
executor = LocalExecutor
DagBag解析时间:1.305286。还请在下面找到命令airflow list_dags -r
的快照
-------------------------------------------------------------------
DagBag loading stats for /root/airflow/dags
-------------------------------------------------------------------
Number of DAGs: 7
Total task number: 23
DagBag parsing time: 1.305286
------------------------------+----------+---------+----------+------------------------------
file | duration | dag_num | task_num | dags
------------------------------+----------+---------+----------+------------------------------
/trigger_cleansing.py | 0.876388 | 1 | 5 | ['trigger_cleansing']
/processing_ebcdic_trigger.py | 0.383038 | 1 | 1 | ['processing_ebcdic_trigger']
/prep_preprocess_dag.py | 0.015474 | 1 | 6 | ['prep_preprocess_dag']
/prep_scale_dag.py | 0.012098 | 1 | 6 | ['dataprep_scale_dag']
/mvp.py | 0.010832 | 1 | 2 | ['dg_a']
/prep_uncompress_dag.py | 0.004142 | 1 | 2 | ['dataprep_unzip_decrypt_dag']
/prep_positional_trigger.py | 0.003314 | 1 | 1 | ['prep_positional_trigger']
------------------------------+----------+---------+----------+------------------------------
下面是气流计划程序服务的状态,它显示多个过程
systemctl status airflow-scheduler
● airflow-scheduler.service - Airflow scheduler daemon
Loaded: loaded (/etc/systemd/system/airflow-scheduler.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2019-03-09 04:44:29 EST; 33min ago
Main PID: 37409 (airflow)
CGroup: /system.slice/airflow-scheduler.service
├─37409 /usr/bin/python3.6 /bin/airflow scheduler
├─37684 /usr/bin/python3.6 /bin/airflow scheduler
├─37685 /usr/bin/python3.6 /bin/airflow scheduler
├─37686 /usr/bin/python3.6 /bin/airflow scheduler
├─37687 /usr/bin/python3.6 /bin/airflow scheduler
├─37688 /usr/bin/python3.6 /bin/airflow scheduler
├─37689 /usr/bin/python3.6 /bin/airflow scheduler
├─37690 /usr/bin/python3.6 /bin/airflow scheduler
├─37691 /usr/bin/python3.6 /bin/airflow scheduler
├─37692 /usr/bin/python3.6 /bin/airflow scheduler
├─37693 /usr/bin/python3.6 /bin/airflow scheduler
├─37694 /usr/bin/python3.6 /bin/airflow scheduler
├─37695 /usr/bin/python3.6 /bin/airflow scheduler
├─37696 /usr/bin/python3.6 /bin/airflow scheduler
├─37697 /usr/bin/python3.6 /bin/airflow scheduler
├─37699 /usr/bin/python3.6 /bin/airflow scheduler
├─37700 /usr/bin/python3.6 /bin/airflow scheduler
├─37701 /usr/bin/python3.6 /bin/airflow scheduler
├─37702 /usr/bin/python3.6 /bin/airflow scheduler
├─37703 /usr/bin/python3.6 /bin/airflow scheduler
├─37704 /usr/bin/python3.6 /bin/airflow scheduler
├─37705 /usr/bin/python3.6 /bin/airflow scheduler
├─37706 /usr/bin/python3.6 /bin/airflow scheduler
├─37707 /usr/bin/python3.6 /bin/airflow scheduler
├─37708 /usr/bin/python3.6 /bin/airflow scheduler
├─37709 /usr/bin/python3.6 /bin/airflow scheduler
├─37710 /usr/bin/python3.6 /bin/airflow scheduler
├─37712 /usr/bin/python3.6 /bin/airflow scheduler
├─37713 /usr/bin/python3.6 /bin/airflow scheduler
├─37714 /usr/bin/python3.6 /bin/airflow scheduler
├─37715 /usr/bin/python3.6 /bin/airflow scheduler
├─37717 /usr/bin/python3.6 /bin/airflow scheduler
├─37718 /usr/bin/python3.6 /bin/airflow scheduler
└─37722 /usr/bin/python3.6 /bin/airflow scheduler
现在我们不断有DAG中包含多个文件的事实不断被触发,并且有足够的DAG任务进入等待阶段。奇怪的是,尽管在使用v1.9时我们没有遇到这个问题,请告知。
答案 0 :(得分:0)
我意识到在' airflow.cfg '文件中,'min_file_process_interval'的值为60。将其设置为零可以解决我在此处报告的问题。