亚马逊ELB无法提供响应

时间:2012-06-14 20:51:35

标签: tomcat amazon-web-services load-balancing amazon-elb

我在Amazon Web Services上运行了一个使用Elastic Beanstalk部署并在单个EC2微实例上运行的网站。这是一个临时环境,我是唯一可以访问它的人。使用Apache JMeter,我模拟在网站上导航的六个用户,平均每3秒平均一次请求(图像,CSS,JS和其他静态资源由CloudFront提供,不会在EC2实例上产生流量)。

问题是,经过一段时间(通常在设置环境后30-60分钟),网站停止响应。我确信Tomcat仍在正常运行,因为我可以在日志(catalina.out)中看到cronjobs仍在执行中。似乎只有ELB无法提供响应。

分析日志,Tomcat上根本没有错误(/opt/tomcat7/logs/tail_catalina.log或/opt/tomcat7/logs/catalina.out中没有错误)。一旦网站无法访问,以下错误就会立即出现在/ etc / httpd / logs / elasticbeanstalk-error_log中:

[Thu Jun 14 20:26:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:26:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:26:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:26:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:27:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:27:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:27:43 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:27:43 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:27:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:27:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:28:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:28:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:28:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:28:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:28:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:28:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:29:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:29:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:29:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:29:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:29:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:29:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:30:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:30:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:30:43 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:30:43 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:30:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:30:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:31:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:31:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:31:43 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:31:43 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:31:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:31:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)
[Thu Jun 14 20:32:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:32:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)

...直到EC2实例最终终止(并自动启动新的实例)。

如果我没有提出任何请求(或者我做的更少),就不会发生这个问题。

非常感谢任何帮助。

谢谢!

2 个答案:

答案 0 :(得分:7)

让我先假设一下:

  • 您的Tomcat应用程序应该在127.0.0.1:8999上侦听

如果这是真的,那么日志事件:

[Thu Jun 14 20:26:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to   connect to 127.0.0.1:8999 (localhost) failed
[Thu Jun 14 20:26:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost)

..建议应用程序监听器死亡。您可以通过以下方式确认:

curl -v http://127.0.0.1:8999/

当网站正常运行时,curl命令应该返回有效的HTTP响应,并且当您遇到中断时,可能会返回Connection refusedcouldn't connect to host。您还可以使用以下命令检查应用程序端口上的有效侦听器:

netstat -an | grep LISTEN | grep 8999

应用程序侦听器可能会死的原因有很多,包括但不限于:

  • JVM的硬崩溃(使用ps查看JVM进程是否仍在运行)
  • 应用程序的软崩溃(查看Tomcat应用程序日志)
  • 用完文件描述符(使用lsof | wc -l并与应用程序用户的ulimit -n比较)

但是,大多数错误都应该导致将错误消息写入JVM进程的stderr,这通常是记录的。这是最好看的地方。如果所有其他方法都失败了,您可能希望尝试在启用调试日志记录的情况下在前台运行Tomcat应用程序。

答案 1 :(得分:1)

我刚刚花了一天时间与这个类似的问题作斗争。我有一个部署到Amazon Elastic Beanstalk环境的WAR文件。与我不同的是,AEBS环境中的实例仅在终止之前持续了5分钟,并被AEBS的新实例取代。

经过相当多的挖掘(在我的实例还活着的5分钟时)和一些light reading我发现AEBS Tomcat实例是在Apache接收端口80请求的情况下创建的。请求{{1重新路由到端口8999和其他任何东西到端口8080(Tomcat)。部署到实例的名为“hostmanager”的Ruby应用程序侦听端口8999.此应用程序可能会向AWS Elastic Beanstalk主机管理器报告流量和数据。其他统计信息,允许Elastic Beanstalk环境获取环境负载的图片,并适当地放大或缩小实例数。

如果AWS Elastic Beanstalk主机管理器未从实例的hostmanager应用程序获得响应,则它将终止该实例并启动一个新实例。这可能是您的网站持续30分钟然后死亡的原因。

所以我想这里的问题不在于您的Java应用程序是在端口8080上提供的,而是由于hostmanager应用程序没有在端口8999上侦听。这可能是导致的原因:

/_hostmanager

查看[Thu Jun 14 20:26:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed [Thu Jun 14 20:26:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) ,因为它可能会为您提供更多有关正在发生的事情以及主持人管理员不满意的原因的线索。

在我的情况下,事实证明我的hostmanager应用程序正在运行/opt/elasticbeanstalk/var/log/hostmanager.log到Amazon S3存储桶并且得到了404响应(我通过查看上面提到的hostmanager.log发现了这一点)。这导致主机管理员无法启动。因此,当传入的请求被重新路由到端口8999时,没有任何东西在监听。失败。实例已终止。

我决定将Elastic Beanstalk环境使用的AMI视为失败的原因,而不是试图弄清楚hostmanager应用程序失败的确切原因。我最终放弃了它,并按照以下步骤使用自定义AMI运行新的Elastic Beanstalk环境:

  1. 使用我的WAR文件
  2. 创建一个新的Elastic Beanstalk环境
  3. 从其创建的实例
  4. 创建AMI
  5. 从步骤2中创建的AMI创建常规EC2实例
  6. 添加了一些我需要的额外位(例如Tomcat管理器)
  7. 从步骤3中创建的常规实例创建AMI
  8. 将AMI应用于Elastic Beanstalk环境
  9. 如果不确切知道你的设置是什么,那么就很难准确地帮助你。虽然希望知道主机管理员在端口8999上侦听,但是hostmanager.log的位置和一些运气会让你到达你想要的位置!