Nginx高容量流量负载平衡

时间:2012-09-02 07:09:40

标签: nginx tomcat6 load-balancing

在过去的3周里,我们一直在测试Nginx的负载平衡。 目前,我们还没有成功处理超过1000个req​​ / sec和18K活动连接。 当我们得到上述数字时,Nginx开始挂起,并返回超时代码。 获得响应的唯一方法是显着减少连接数。

我必须注意,我的服务器可以并且确实每天处理这一数量的流量,我们目前使用简单的循环rubin DNS平衡。

我们正在使用具有以下硬件的专用服务器:

  • INTEL XEON E5620 CPU
  • 16GB RAM
  • 2T SATA HDD
  • 1Gb / s连接
  • 操作系统:CentOS 5.8

我们需要对运行Tomcat6的7台服务器进行负载平衡,并在偷看时处理超过2000 req / sec,处理HTTP和HTTPS请求。

运行Nginx的CPU消耗约为15%,而使用的RAM约为100MB。

我的问题是:

  1. 有没有人试图使用nginx来平衡这种流量?
  2. 您认为nginx可以处理此类流量吗?
  3. 你知道是什么导致悬挂吗?
  4. 我的配置上遗漏了什么?
  5. 以下是我的配置文件:

    nginx.conf:

    user  nginx;
    worker_processes 10;
    
    worker_rlimit_nofile 200000;
    
    error_log  /var/log/nginx/error.log warn;
    pid        /var/run/nginx.pid;
    
    
    events {
        worker_connections  10000;
        use epoll;
        multi_accept on;
    }
    
    
    http {
        include       /etc/nginx/mime.types;
        default_type  application/octet-stream;
    
        log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                          '$status $body_bytes_sent "$http_referer" '
                          '"$http_user_agent" "$http_x_forwarded_for"';
    
        #access_log  /var/log/nginx/access.log  main;
        access_log off;
    
        sendfile        on;
        tcp_nopush     on;
    
        keepalive_timeout  65;
        reset_timedout_connection on;
    
        gzip  on;
        gzip_comp_level 1;
        include /etc/nginx/conf.d/*.conf;
    }
    

    servers.conf:

    #Set the upstream (servers to load balance)
    #HTTP stream
    upstream adsbar {
      least_conn;
      server xx.xx.xx.34 max_fails=2 fail_timeout=15s;
      server xx.xx.xx.36 max_fails=2 fail_timeout=15s;
      server xx.xx.xx.37 max_fails=2 fail_timeout=15s;
      server xx.xx.xx.39 max_fails=2 fail_timeout=15s;
      server xx.xx.xx.40 max_fails=2 fail_timeout=15s;
      server xx.xx.xx.42 max_fails=2 fail_timeout=15s;
      server xx.xx.xx.43 max_fails=2 fail_timeout=15s;
    }      
    
    #HTTPS stream
    upstream adsbar-ssl {
      least_conn;
      server xx.xx.xx.34:443 max_fails=2 fail_timeout=15s;
      server xx.xx.xx.36:443 max_fails=2 fail_timeout=15s;
      server xx.xx.xx.37:443 max_fails=2 fail_timeout=15s;
      server xx.xx.xx.39:443 max_fails=2 fail_timeout=15s;
      server xx.xx.xx.40:443 max_fails=2 fail_timeout=15s;
      server xx.xx.xx.42:443 max_fails=2 fail_timeout=15s;
      server xx.xx.xx.43:443 max_fails=2 fail_timeout=15s;
    }
    
    #HTTP
    server {
      listen xxx.xxx.xxx.xxx:8080;
      server_name www.mycompany.com;
      location / {
          proxy_set_header Host $host;
          # So the original HTTP Host header is preserved
          proxy_set_header X-Real-IP $remote_addr;
          # The IP address of the client (which might be a proxy itself)
          proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
          proxy_pass http://adsbar;
      }
    }
    
    #HTTPS
    server {
      listen xxx.xxx.xxx.xxx:8443;
      server_name www.mycompany.com;
      ssl on;
      ssl_certificate /etc/pki/tls/certs/mycompany.crt;
      # Path to an SSL certificate;
      ssl_certificate_key /etc/pki/tls/private/mycompany.key;
      # Path to the key for the SSL certificate;
      location / {
          proxy_set_header Host $host;
          # So the original HTTP Host header is preserved
          proxy_set_header X-Real-IP $remote_addr;
          # The IP address of the client (which might be a proxy itself)
          proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
          proxy_pass https://adsbar-ssl;
      }
    }
    
    server {
        listen xxx.xxx.xxx.xxx:61709;
        location /nginx_status {
            stub_status on;
            access_log off;
            allow 127.0.0.1;
            deny all;
        }
    }
    

    sysctl.conf的:

    # Kernel sysctl configuration file for Red Hat Linux
    #
    # For binary values, 
    
    0 is disabled, 1 is enabled.  See sysctl(8) and
    # sysctl.conf(5) for more details.
    
    # Controls IP packet forwarding
    net.ipv4.ip_forward = 0
    
    # Controls source route verification
    net.ipv4.conf.default.rp_filter = 1
    
    # Do not accept source routing
    net.ipv4.conf.default.accept_source_route = 0
    
    # Controls the System Request debugging functionality of the kernel
    kernel.sysrq = 1
    
    # Controls whether core dumps will append the PID to the core filename
    # Useful for debugging multi-threaded applications
    kernel.core_uses_pid = 1
    
    # Controls the use of TCP syncookies
    net.ipv4.tcp_syncookies = 1
    
    # Controls the maximum size of a message, in bytes
    kernel.msgmnb = 65536
    
    # Controls the default maxmimum size of a mesage queue
    kernel.msgmax = 65536
    
    # Controls the maximum shared segment size, in bytes
    kernel.shmmax = 68719476736
    
    # Controls the maximum number of shared memory segments, in pages
    kernel.shmall = 4294967296
    
    fs.file-max = 120000
    net.ipv4.ip_conntrack_max = 131072
    net.ipv4.tcp_max_syn_backlog = 8196
    net.ipv4.tcp_fin_timeout = 25
    net.ipv4.tcp_keepalive_time = 3600
    net.ipv4.ip_local_port_range = 1024 65000
    net.ipv4.tcp_rmem = 4096 25165824 25165824
    net.core.rmem_max = 25165824
    net.core.rmem_default = 25165824
    net.ipv4.tcp_wmem = 4096 65536 25165824
    net.core.wmem_max = 25165824
    net.core.wmem_default = 65536
    net.core.optmem_max = 25165824
    net.core.netdev_max_backlog = 2500
    net.ipv4.tcp_tw_recycle = 1
    net.ipv4.tcp_tw_reuse = 1
    

    任何帮助,指导和想法都将受到高度赞赏。

3 个答案:

答案 0 :(得分:17)

以下是一些很好的参考资料:

http://dak1n1.com/blog/12-nginx-performance-tuning

服务器故障: https://serverfault.com/questions/221292/tips-for-maximizing-nginx-requests-sec

来自dak1n1链接的详细记录配置:

# This number should be, at maximum, the number of CPU cores on your system. 
# (since nginx doesn't benefit from more than one worker per CPU.)
worker_processes 24;

# Number of file descriptors used for Nginx. This is set in the OS with 'ulimit -n 200000'
# or using /etc/security/limits.conf
worker_rlimit_nofile 200000;


# only log critical errors
error_log /var/log/nginx/error.log crit


# Determines how many clients will be served by each worker process.
# (Max clients = worker_connections * worker_processes)
# "Max clients" is also limited by the number of socket connections available on the system (~64k)
worker_connections 4000;


# essential for linux, optmized to serve many clients with each thread
use epoll;


# Accept as many connections as possible, after nginx gets notification about a new connection.
# May flood worker_connections, if that option is set too low.
multi_accept on;


# Caches information about open FDs, freqently accessed files.
# Changing this setting, in my environment, brought performance up from 560k req/sec, to 904k req/sec.
# I recommend using some varient of these options, though not the specific values listed below.
open_file_cache max=200000 inactive=20s; 
open_file_cache_valid 30s; 
open_file_cache_min_uses 2;
open_file_cache_errors on;


# Buffer log writes to speed up IO, or disable them altogether
#access_log /var/log/nginx/access.log main buffer=16k;
access_log off;


# Sendfile copies data between one FD and other from within the kernel. 
# More efficient than read() + write(), since the requires transferring data to and from the user space.
sendfile on; 


# Tcp_nopush causes nginx to attempt to send its HTTP response head in one packet, 
# instead of using partial frames. This is useful for prepending headers before calling sendfile, 
# or for throughput optimization.
tcp_nopush on;


# don't buffer data-sends (disable Nagle algorithm). Good for sending frequent small bursts of data in real time.
tcp_nodelay on; 


# Timeout for keep-alive connections. Server will close connections after this time.
keepalive_timeout 30;


# Number of requests a client can make over the keep-alive connection. This is set high for testing.
keepalive_requests 100000;


# allow the server to close the connection after a client stops responding. Frees up socket-associated memory.
reset_timedout_connection on;


# send the client a "request timed out" if the body is not loaded by this time. Default 60.
client_body_timeout 10;


# If the client stops reading data, free up the stale client connection after this much time. Default 60.
send_timeout 2;


# Compression. Reduces the amount of data that needs to be transferred over the network
gzip on;
gzip_min_length 10240;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml;
gzip_disable "MSIE [1-6]\.";

还有关于sysctl.conf的linux系统调优的更多信息:

# Increase system IP port limits to allow for more connections

net.ipv4.ip_local_port_range = 2000 65000


net.ipv4.tcp_window_scaling = 1


# number of packets to keep in backlog before the kernel starts dropping them 
net.ipv4.tcp_max_syn_backlog = 3240000


# increase socket listen backlog
net.core.somaxconn = 3240000
net.ipv4.tcp_max_tw_buckets = 1440000


# Increase TCP buffer sizes
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_congestion_control = cubic

答案 1 :(得分:2)

nginx肯定能够处理超过1000 req / s(当我用便携式笔记本电脑玩jmeter使用一个和两个核心的一半时,我在nginx中得到大约2800 req / s)

你正在使用epoll,这是我理解的当前linux内核的最佳选择。

你已经关闭了acces_log,所以你的磁盘IO也不应该是一个瓶颈(注意:你也可以将access_log设置为缓冲模式,其中只有一个大缓冲区,它只能在每个x kb之后写入,这样可以避免磁盘不断被锤击,但保留日志进行分析)

我的理解是,为了最大化nginx性能,通常将worker_processes的数量设置为等于core / cpu的数量,然后增加worker_connections的数量以允许更多并发连接(以及打开文件的数量os限制)。然而,在您上面发布的数据中,您有一个四核cpu,其中包含10个工作进程,每个进程允许10k个连接。因此,在nginx方面,我会尝试类似:

worker_processes 4;
worker_rlimit_nofile 999999;
events {
  worker_connections 32768;
  use epoll;
  multi_accept on;
}

在内核方面,我会以不同的方式调整tcp读写缓冲区,你需要一个小的最小值,小的默认值和大的最大值。

你已经提升了短暂的端口范围。

我的数字打开文件限制更多,因为你将有很多打开的套接字。

在/etc/sysctl.conf

中添加/更改以下行
net.ipv4.tcp_rmem = 4096 4096 25165824                                
net.ipv4.tcp_wmem = 4096 4096 25165824
fs.file-max=999999

希望有所帮助。

答案 2 :(得分:2)

我发现使用连接最少的算法是有问题的。我切换到了

echo filename.b.c | tr . '\n'

并且更快地找到了服务。