我有一个rails应用程序,它运行在ec2服务器上的家用web server(带有amazon linux的m1.small)和AWS-rds / mysql数据库(t1.micro)。这种情况就像魅力一样(今天早上,过去30天的正常运行时间约为99.9%)。
但偶尔应用程序会停留大约14分钟(应用程序由pingdom监控)。当它发生时,它通常分批发生。今天我已经有4次这个问题。当我足够快时,我可以登录服务器,安装gdb并将调试器附加到Web服务器。然后堆栈顶部如下所示:
thread 1.
(gdb) bt
#0 0x00007fafa28b154d in read () from /lib64/libpthread.so.0
#1 0x00007faf98736332 in ?? () from /usr/lib64/mysql/libmysqlclient.so.18
#2 0x00007faf9872841f in ?? () from /usr/lib64/mysql/libmysqlclient.so.18
#3 0x00007faf98728ffa in ?? () from /usr/lib64/mysql/libmysqlclient.so.18
#4 0x00007faf98722615 in ?? () from /usr/lib64/mysql/libmysqlclient.so.18
#5 0x00007faf98726254 in ?? () from /usr/lib64/mysql/libmysqlclient.so.18
#6 0x00007faf9871e30d in mysql_ping () from /usr/lib64/mysql/libmysqlclient.so.18
#7 0x00007faf98be1aed in nogvl_ping (ptr=0x47a1ec0) at client.c:627
#8 0x00007fafa2c59c29 in rb_thread_blocking_region () from /home/ec2-user/.rvm/rubies/ruby-1.9.3-p327/lib/libruby.so.1.9
#9 0x00007faf98be1b5d in rb_mysql_client_ping (self=70801240) at client.c:636
#10 0x00007fafa2c3f108 in call_cfunc () from /home/ec2-user/.rvm/rubies/ruby-1.9.3-p327/lib/libruby.so.1.9
#11 0x00007fafa2c3fa0d in vm_call_cfunc () from /home/ec2-user/.rvm/rubies/ruby-1.9.3-p327/lib/libruby.so.1.9
#12 0x00007fafa2c400d3 in vm_call_method () from /home/ec2-user/.rvm/rubies/ruby-1.9.3-p327/lib/libruby.so.1.9
#13 0x00007fafa2c45987 in vm_exec_core () from /home/ec2-user/.rvm/rubies/ruby-1.9.3-p327/lib/libruby.so.1.9
#14 0x00007fafa2c52d2a in vm_exec () from /home/ec2-user/.rvm/rubies/ruby-1.9.3-p327/lib/libruby.so.1.9
#15 0x00007fafa2c516af in invoke_block_from_c () from /home/ec2-user/.rvm/rubies/ruby-1.9.3-p327/lib/libruby.so.1.9
#16 0x00007fafa2c517c5 in vm_yield () from /home/ec2-user/.rvm/rubies/ruby-1.9.3-p327/lib/libruby.so.1.9
mysql版本是5.5。 aws提供的数据库日志中没有条目。 rails日志只有14分钟的间隔(xxx / auto_test是AWS负载均衡器每隔10秒检查一次实例的URL):
Started GET "/xxx/auto_test" for 10.224.95.251 at 2013-02-06 17:59:32 +0000
Processing by HealthCheckController#status as */*
Rendered health_check/status.html.erb within layouts/application (0.1ms)
Rendered layouts/_render_flash.html.erb (0.1ms)
Rendered layouts/_debug_info.html.erb (0.0ms)
Completed 200 OK in 8ms (Views: 6.0ms | ActiveRecord: 1.3ms)
Started GET "/xxx/auto_test" for 10.224.95.251 at 2013-02-06 18:13:38 +0000
Processing by HealthCheckController#status as */*
Rendered health_check/status.html.erb within layouts/application (0.1ms)
Rendered layouts/_render_flash.html.erb (0.1ms)
Rendered layouts/_debug_info.html.erb (0.0ms)
Completed 200 OK in 7ms (Views: 5.5ms | ActiveRecord: 1.2ms)
在停电期间,当数据库不再阻塞时,来自负载均衡器的请求会堆积起来并得到应答。
什么可能导致数据库阻止?我需要寻找什么信息才能解决这个问题?有关解决方法的任何建议吗?任何指针都欢迎并高度赞赏!
更新
我今天再次看到了这个问题。中断持续了14分钟,我附加了一个调试器并获得了相同的回溯。因此,使用本机MySql超时不能解决问题。
iptables -L
也没有显示任何有趣的内容。
答案 0 :(得分:2)
这可能是防火墙问题。您应该检查系统上是否没有配置防火墙的脚本,以便与MySQL的连接被阻止15分钟。您可以通过在MySQL连接行为不正常的时间窗口内运行'iptables -L'来验证此假设。
答案 1 :(得分:2)
您似乎遇到了类似于此处所述的内容:https://github.com/brianmario/mysql2/pull/287
您是偶然使用mysql2适配器吗?如果没有,您可以尝试使用它并设置read_timeout选项以查看它是否发生了任何变化?