nginx监控无法加载负载下的状态页面

时间:2015-12-16 18:42:13

标签: nginx zabbix sysctl

Nginx监控脚本所谓的ztc无法加载nginx测试页面(主要是在nginx最高负载下大约2000 rps,用作代理),导致zabbix上出现“nginx is down”等错误,并且,一秒钟,一切似乎都没问题。

 [NginxStatus] 2015-12-16 20:24:55,289 - ERROR: failed to load test page
    Traceback (most recent call last):
      File "/usr/lib/python2.6/site-packages/ztc/nginx/__init__.py", line 56, in _read_status
        u = urllib2.urlopen(url, None, 1)
      File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen
        return _opener.open(url, data, timeout)
      File "/usr/lib64/python2.6/urllib2.py", line 391, in open
        response = self._open(req, data)
      File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
        '_open', req)
      File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
        result = func(*args)
      File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
        return self.do_open(httplib.HTTPConnection, req)
      File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
        raise URLError(err)
    URLError: <urlopen error timed out>

因为它只在最高负载下发生,大约2000 rps,我将它与一些内核参数相关联,这导致了这一点。

这是nginx配置:

user nginx;
worker_processes  4;
timer_resolution 100ms;
worker_priority -15;
worker_rlimit_nofile 200000;

error_log  /var/log/nginx/error.log;
pid        /var/run/nginx.pid;

events {
  worker_connections  65536;
  use epoll;
  multi_accept on;
}
http {

  include       /etc/nginx/mime.types;
  default_type  application/octet-stream;

  server_tokens off;

  access_log    /var/log/nginx/access.log;

  sendfile on;
  tcp_nopush on;
  tcp_nodelay on;

#  keepalive_requests 120;
#  keepalive_timeout  65;


  gzip  on;
  gzip_http_version 1.0;
  gzip_comp_level 2;
  gzip_proxied any;
  gzip_vary off;
  gzip_types text/plain text/css application/x-javascript text/xml application/xml application/rss+xml application/atom+xml text/javascript application/javas$
ript application/json text/mathml;
  gzip_min_length  1000;
  gzip_disable     "MSIE [1-6]\.";


  variables_hash_max_size 1024;
  variables_hash_bucket_size 64;
  server_names_hash_bucket_size 64;
  types_hash_max_size 2048;
  types_hash_bucket_size 64;



  include /etc/nginx/conf.d/*.conf;
  include /etc/nginx/sites-enabled/*;
}

这是sysctl.conf

net.ipv4.conf.all.secure_redirects=0
net.ipv4.conf.all.send_redirects=0
net.ipv4.tcp_max_syn_backlog=20480
net.ipv4.tcp_synack_retries=2
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.netfilter.nf_conntrack_max=1048576
net.nf_conntrack_max=1048576
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_tw_reuse=1
net.core.somaxconn=15000
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_intvl=15
net.ipv4.tcp_keepalive_probes=5
net.ipv4.tcp_max_tw_buckets=720000
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_fin_timeout=30

netstat输出:

netstat -an | grep -e :80 -e :443 |awk '/^tcp/ {A[$(NF)]++} END {for (I in A) {printf "%5d %s\n", A[I], I}}'

18525 TIME_WAIT
    1 CLOSE_WAIT
  499 FIN_WAIT1
 1544 FIN_WAIT2
33311 ESTABLISHED
  563 SYN_RECV
    7 CLOSING
  294 LAST_ACK
    3 LISTEN

这可能是什么原因? netstat指标是否异常2000rps?我的sysctl.conf中是否有错误,导致我的问题?

0 个答案:

没有答案