我在Unicorn和Nginx(多个网络节点)上运行了一个非常高流量的Rails 3.2应用程序,但每隔一段时间我就会看到Unicorn工作人员开始超时并且在所有节点上被独角兽大师嘲笑。当然,当一名Unicorn工作人员被独角兽大师嘲笑时,一名新工作人员会在其位置上分叉,但它也会挂起60秒然后超时并被杀死。这基本上是反复发生的,直到我杀死所有独角兽大师和工人。
Unicorn log:
E, [2013-04-18T12:57:50.007623 #14002] ERROR -- : worker=8 PID:14968 timeout (62s > 60s), killing
E, [2013-04-18T12:57:50.108364 #14002] ERROR -- : reaped #<Process::Status: pid 14968 SIGKILL (signal 9)> worker=8
I, [2013-04-18T12:57:50.489505 #15726] INFO -- : worker=8 ready
E, [2013-04-18T12:57:52.175842 #14002] ERROR -- : worker=5 PID:15033 timeout (61s > 60s), killing
E, [2013-04-18T12:57:52.276586 #14002] ERROR -- : reaped #<Process::Status: pid 15033 SIGKILL (signal 9)> worker=5
I, [2013-04-18T12:57:52.653069 #15782] INFO -- : worker=5 ready
E, [2013-04-18T12:57:56.340290 #14002] ERROR -- : worker=3 PID:15074 timeout (61s > 60s), killing
E, [2013-04-18T12:57:56.440993 #14002] ERROR -- : reaped #<Process::Status: pid 15074 SIGKILL (signal 9)> worker=3
I, [2013-04-18T12:57:56.809730 #15832] INFO -- : worker=3 ready
E, [2013-04-18T12:57:57.504142 #14002] ERROR -- : worker=7 PID:15087 timeout (61s > 60s), killing
E, [2013-04-18T12:57:57.604886 #14002] ERROR -- : reaped #<Process::Status: pid 15087 SIGKILL (signal 9)> worker=7
I, [2013-04-18T12:57:57.983581 #15845] INFO -- : worker=7 ready
E, [2013-04-18T12:57:59.669664 #14002] ERROR -- : worker=4 PID:15108 timeout (61s > 60s), killing
E, [2013-04-18T12:57:59.770427 #14002] ERROR -- : reaped #<Process::Status: pid 15108 SIGKILL (signal 9)> worker=4
I, [2013-04-18T12:58:00.155461 #15879] INFO -- : worker=4 ready
E, [2013-04-18T12:58:06.839906 #14002] ERROR -- : worker=9 PID:15192 timeout (61s > 60s), killing
E, [2013-04-18T12:58:06.940829 #14002] ERROR -- : reaped #<Process::Status: pid 15192 SIGKILL (signal 9)> worker=9
I, [2013-04-18T12:58:07.302766 #15956] INFO -- : worker=9 ready
E, [2013-04-18T12:58:08.003330 #14002] ERROR -- : worker=6 PID:15213 timeout (61s > 60s), killing
E, [2013-04-18T12:58:08.104006 #14002] ERROR -- : reaped #<Process::Status: pid 15213 SIGKILL (signal 9)> worker=6
I, [2013-04-18T12:58:08.466790 #15973] INFO -- : worker=6 ready
监控系统显示外部服务(Postgres数据库,Memcached,Redis)都正确响应并且没有延迟问题。
以下是一些可能有价值的输出:
在这些中断过程中,我注意到了对Unicorn套接字的大量尝试连接。当站点未关闭时,通常以下命令仅返回一行或两行。
netstat | grep unic
....
unix 2 [ ] STREAM CONNECTING 0 /tmp/unicorn.sock
unix 2 [ ] STREAM CONNECTING 0 /tmp/unicorn.sock
unix 2 [ ] STREAM CONNECTING 0 /tmp/unicorn.sock
unix 2 [ ] STREAM CONNECTING 0 /tmp/unicorn.sock
unix 2 [ ] STREAM CONNECTING 0 /tmp/unicorn.sock
unix 2 [ ] STREAM CONNECTED 7768134 /tmp/unicorn.sock
unix 2 [ ] STREAM CONNECTED 7767311 /tmp/unicorn.sock
unix 2 [ ] STREAM CONNECTED 7766999 /tmp/unicorn.sock
unix 2 [ ] STREAM CONNECTED 7767309 /tmp/unicorn.sock
unix 2 [ ] STREAM CONNECTED 7766941 /tmp/unicorn.sock
unix 2 [ ] STREAM CONNECTED 7767287 /tmp/unicorn.sock
unix 2 [ ] STREAM CONNECTED 7766225 /tmp/unicorn.sock
任何人都知道可能导致这种情况的原因是什么?这种情况在多个服务器上同时发生。
答案 0 :(得分:0)
DDOS?在探索应用程序级别之前,我会确保您知道在网络级别发生了什么。它有多少连接有效?什么时候没有?从什么IP?等