我的环境中有两个r5a.xlarge
ec2实例。每个实例具有4个vCPU和32个GiB内存
应用程序处理一些文件,将json中的数据返回给客户端。它处理的两个文件也很大(约1.5GB)。我没有数据库连接。该应用程序将Python 3.6与flask
配合使用,并在apache
服务器上运行
在一些传入请求之后,实例将进入“降级”状态。 显示的原因是:
25.0 % of the requests are failing with HTTP 5xx.
Instance ELB health state has been "OutOfService" for 1 hour 23 minutes: Instance has failed at least the UnhealthyThreshold number of health checks consecutively.
99 % of CPU is in use. 96 % in I/O wait.
100 % of memory is in use.
尽管停止了传入请求,但仍保持这种状态。
另一个实例由于某种原因部署了错误的版本。
Incorrect application version "app-xxxxxxx" (deployment 24). Expected version "app-yyyyyy" (deployment 23).
我将负载均衡器的容量设置为0。这删除了两个实例。我重新部署了该应用程序,然后将容量设置回原始设置,即Min = 1,Max = 2,Desired = 2
我这样做是为了使它具有具有正确代码基础版本的新实例。
现在可以运行1个实例,并且在7-8个以上的请求之后,它又再次进入降级状态。 原因再次是
100 % of memory is in use.
我已尝试创建here
所述的交换空间我什至检查了httpd_error
日志,但没有发现与此相关的任何错误。这是httpd_error
文件中的全部错误
[suexec:notice] [pid 2880] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[http2:warn] [pid 2880] AH10034: The mpm module (prefork.c) is not supported by mod_http2. The mpm determines how things are processed in your server. HTTP/2 has more demands in this regard and the currently selected mpm will just not do. This is an advisory warning. Your server will continue to work, but the HTTP/2 protocol will be inactive.
[http2:warn] [pid 2880] AH02951: mod_ssl does not seem to be enabled
[lbmethod_heartbeat:notice] [pid 2880] AH02282: No slotmem from mod_heartmonitor
[:warn] [pid 2880] mod_wsgi: Compiled for Python/3.6.2.
[:warn] [pid 2880] mod_wsgi: Runtime using Python/3.6.12.
[mpm_prefork:notice] [pid 2880] AH00163: Apache/2.4.46 (Amazon) mod_wsgi/3.5 Python/3.6.12 configured -- resuming normal operations
我什至如何开始解决这个问题?
答案 0 :(得分:0)
问题出在应用程序中。我必须对应用程序读取和处理数据的方式进行一些重大更改。该应用程序正在为每个请求读取大量数据-这几乎导致服务器过载和挂起。