PHP FPM 7.1套接字泄漏导致NGINX - 504网关超时

时间:2017-09-15 23:57:10

标签: php nginx amazon-ec2 fpm http-status-code-504

我使用Laravel Forge来启动我的EC2环境,这为我创建了一个LEMP堆栈。我最近开始对请求进行504次超时。

我没有系统管理员(因此订阅了Forge),但我查看了日志并将问题缩小到我的日志中的这两个重复条目:

in:/var/log/nginx/default-error.log

2017/09/15 09:32:17 [error] 2308#2308: *1 upstream timed out (110: Connection timed out) while sending request to upstream, client: x.x.x.x, server: xxxx.com, request: "POST /upload HTTP/2.0", upstream: "fastcgi://unix:/var/run/php/php7.1-fpm.sock", host: "xxxx.com", referrer: "https://xxxx.com/rest/of/the/path"

in:/var/log/php7.1-fpm-log

[15-Sep-2017 09:35:09] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 14 total children

似乎fpm打开永不消亡的连接,从我的RDS加载日志中我可以看到RAM不断被最大化。

我试过了:

  • 回滚到我的应用程序的稳定版本(2个月前)
  • 使用5.6,7.0和7.1(及其各自的fpm)重新安装 EC2
  • 在14.04和16.04上进行以上所有操作
  • 创建更大的RDS

现在唯一可行的是强大的RDS(8gb RAM)+每300个请求杀死fpm池连接。但显然在这个问题上投入资源并不是解决方案。

以下是/etc/php/7.1/fpm/pool.d/www.conf

的配置
user = forge
group = forge
listen = /run/php/php7.1-fpm.sock
listen.owner = www-data
listen.group = www-data
listen.mode = 0666
pm = dynamic
pm.max_children = 30
pm.start_servers = 7
pm.min_spare_servers = 6
pm.max_spare_servers = 10
pm.process_idle_timeout = 7s;
pm.max_requests = 300

这是nginx.conf

的配置
listen 80;
listen [::]:80;
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name xxxx.com;
root /home/forge/xxxx.com/public;

# FORGE SSL (DO NOT REMOVE!)
ssl_certificate /etc/nginx/ssl/xxxx.com/111111/server.crt;
ssl_certificate_key /etc/nginx/ssl/xxxx.com/111111/server.key;

ssl_protocols xxxx;
ssl_ciphers ...;
ssl_prefer_server_ciphers on;
ssl_dhparam /etc/nginx/dhparams.pem;

add_header X-Frame-Options "SAMEORIGIN";
add_header X-XSS-Protection "1; mode=block";
add_header X-Content-Type-Options "nosniff";

index index.html index.htm index.php;

charset utf-8;

# FORGE CONFIG (DOT NOT REMOVE!)
include forge-conf/xxxx.com/server/*;

location / {
    try_files $uri $uri/ /index.php?$query_string;
}

location = /favicon.ico 
location = /robots.txt  

access_log /var/log/nginx/xxxx.com-access.log;
error_log  /var/log/nginx/xxxx.com-error.log error;

error_page 404 /index.php;

location ~ \.php$ {
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    fastcgi_pass unix:/var/run/php/php7.1-fpm.sock;
    fastcgi_index index.php;
    fastcgi_read_timeout 60;
    include fastcgi_params;
}

location ~ /\.(?!well-known).* {
    deny all;
}

location ~* \.(?:ico|css|js|gif|jpe?g|png)$ {
    expires 30d;
    add_header Pragma public;
    add_header Cache-Control "public";
}

1 个答案:

答案 0 :(得分:0)

好的,经过大量的调试和测试后,我发现了这几个原因。

  • 我的主要原因:我用于MySQL的AWS RDS实例有500Mb的内存。回顾过去,一旦数据库大小超过400Mb,所有这些问题就开始了。

    • 解决方案:确保您的数据库大小始终为2x RAM。否则整个B +树都不适合内存,所以它必须进行不断的交换。这可能会使您的查询时间超过15秒。
  • 这些问题的主要原因:未优化的SQL查询。

    • 解决方案:在您的localhost中维护与服务器上数据大小相似的数据。