气流gunircorn [CRITICAL]工作人员超时

时间:2019-11-25 06:15:11

标签: docker airflow gunicorn amazon-ecs

我尝试使用叉子apache/airflow使用airflowv1-10-stable分支在ECS中运行apache气流。我正在使用env变量将执行程序,Postgres和Redis信息设置到Web服务器。

AIRFLOW__CORE__SQL_ALCHEMY_CONN="postgresql+psycopg2://airflow_user:airflow_password@postgres:5432/airflow_db"
AIRFLOW__CELERY__RESULT_BACKEND="db+postgresql://airflow_user:airflow_password@postgres:5432/airflow_db"
AIRFLOW__CELERY__BROKER_URL="redis://redis_queue:6379/1"
AIRFLOW__CORE__EXECUTOR=CeleryExecutor
FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho=
AIRFLOW__CORE__LOAD_EXAMPLES=False

我正在使用CMD-SHELL [ -f /home/airflow/airflow/airflow-webserver.pid ]作为ECS容器的运行状况检查。我可以从Docker容器连接到Postgres和Redis,因此也没有安全组的问题。

使用docker ps,我可以看到容器运行状况良好,并且容器端口与ec2实例0.0.0.0:32794->8080/tcp的映射

但是当我尝试打开W​​eb服务器UI时,它没有打开。即使卷曲,它也不起作用。我从ec2-instance中尝试了curl localhost:32794,从容器中尝试了curl localhost:8080,但是它们都不起作用。 telnet在两种情况下均有效。

在集装箱日志中,我可以看到枪械工人不断超时!

[2019-11-25 05:30:39,236] {__init__.py:51} INFO - Using executor CeleryExecutor
[2019-11-25 05:30:39 +0000] [11] [CRITICAL] WORKER TIMEOUT (pid:17337)
[2019-11-25 05:30:39 +0000] [17337] [INFO] Worker exiting (pid: 17337)
[2019-11-25 05:30:39,430] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags
[2019-11-25 05:30:39,472] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags
[2019-11-25 05:30:39,479] {__init__.py:51} INFO - Using executor CeleryExecutor
[2019-11-25 05:30:39,447] {__init__.py:51} INFO - Using executor CeleryExecutor
[2019-11-25 05:30:39,524] {__init__.py:51} INFO - Using executor CeleryExecutor
[2019-11-25 05:30:39,719] {__init__.py:51} INFO - Using executor CeleryExecutor
[2019-11-25 05:30:39,930] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags
[2019-11-25 05:30:40,139] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags
[2019-11-25 05:30:40,244] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags
[2019-11-25 05:30:40 +0000] [11] [CRITICAL] WORKER TIMEOUT (pid:17338)
[2019-11-25 05:30:40 +0000] [11] [CRITICAL] WORKER TIMEOUT (pid:17339)
[2019-11-25 05:30:40 +0000] [17393] [INFO] Booting worker with pid: 17393
[2019-11-25 05:30:40,412] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags

ec2-instance正在使用Amazon Linux 2,我可以在/var/log/messages中不断记录这些日志

Nov 25 05:57:15 ip-172-31-67-43 ec2net: [rewrite_aliases] Rewriting aliases of eth0
Nov 25 05:58:16 ip-172-31-67-43 dhclient[2724]: XMT: Solicit on eth0, interval 131000ms.
Nov 25 06:00:27 ip-172-31-67-43 dhclient[2724]: XMT: Solicit on eth0, interval 127900ms.
Nov 25 06:01:01 ip-172-31-67-43 systemd: Created slice User Slice of root.
Nov 25 06:01:01 ip-172-31-67-43 systemd: Starting User Slice of root.
Nov 25 06:01:01 ip-172-31-67-43 systemd: Started Session 77 of user root.
Nov 25 06:01:01 ip-172-31-67-43 systemd: Starting Session 77 of user root.
Nov 25 06:01:01 ip-172-31-67-43 systemd: Removed slice User Slice of root.
Nov 25 06:01:01 ip-172-31-67-43 systemd: Stopping User Slice of root.
Nov 25 06:02:35 ip-172-31-67-43 dhclient[2724]: XMT: Solicit on eth0, interval 131620ms.
Nov 25 06:04:36 ip-172-31-67-43 systemd: Started Session 78 of user ec2-user.
Nov 25 06:04:36 ip-172-31-67-43 systemd-logind: New session 78 of user ec2-user.
Nov 25 06:04:36 ip-172-31-67-43 systemd: Starting Session 78 of user ec2-user.
Nov 25 06:04:46 ip-172-31-67-43 dhclient[2724]: XMT: Solicit on eth0, interval 125300ms.
Nov 25 06:06:52 ip-172-31-67-43 dhclient[2724]: XMT: Solicit on eth0, interval 115230ms.
Nov 25 06:08:47 ip-172-31-67-43 dhclient[2724]: XMT: Solicit on eth0, interval 108100ms.

2 个答案:

答案 0 :(得分:1)

关于这些超时错误:

[CRITICAL] WORKER TIMEOUT

您可以通过以下两个Airflow环境变量设置Gunicorn超时:

AIRFLOW__WEBSERVER__WEB_SERVER_MASTER_TIMEOUT

网络服务器杀死无响应的gunicorn主服务器之前等待的秒数。

AIRFLOW__WEBSERVER__WEB_SERVER_WORKER_TIMEOUT

gunicorn网络服务器在超时之前等待的秒数。

有关更多信息,请参见Airflow documentation

我必须解决我的Airflow安装错误。我将这两个超时都设置为300秒,这使我能够加载Airflow Web UI,以便随后可以调试页面加载缓慢的根本原因。

答案 1 :(得分:0)

在将 AIRFLOW_HOME 设置为我的 EFS 挂载的情况下部署 Airflow 时,我收到此错误。将其设置为 ~/airflow 解决了该问题。