我们的后端服务器使用Embedded Jetty 8.1.15已有几年了。直到最近,当我们开始对很多并发用户进行负载测试时,它才没有任何问题。然后,即使使用JMeter(HTTP采样器的并发池数为1000和KeepAlive),即使用户数量很少,我们也可以成功重现该问题。客户端和服务器之间的通信通过TLS(客户端具有连接池) 客户端(连接池)-> TLS->服务器 我们看到的行为-在某些时间点,许多线程被“卡死”(在监视器上以不同的方法等待),并带有以下堆栈跟踪-
Thread "qtp438123546-99":
at java.security.SecureRandom.nextBytes(byte[ ]) (line: 457)
at sun.security.ssl.RandomCookie.<init>(java.security.SecureRandom) (line: 53)
at sun.security.ssl.ServerHandshaker.clientHello(sun.security.ssl.HandshakeMessage$ClientHello) (line: 522)
at sun.security.ssl.ServerHandshaker.processMessage(byte, int) (line: 213)
at sun.security.ssl.Handshaker.processLoop() (line: 925)
at sun.security.ssl.Handshaker$1.run() (line: 865)
at sun.security.ssl.Handshaker$1.run() (line: 862)
at java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext)
at sun.security.ssl.Handshaker$DelegatedTask.run() (line: 1302)
at org.eclipse.jetty.io.nio.SslConnection.process(org.eclipse.jetty.io.Buffer, org.eclipse.jetty.io.Buffer) (line: 375)
at org.eclipse.jetty.io.nio.SslConnection.access$900(org.eclipse.jetty.io.nio.SslConnection, org.eclipse.jetty.io.Buffer, org.eclipse.jetty.io.Buffer) (line: 48)
at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.fill(org.eclipse.jetty.io.Buffer) (line: 678)
at org.eclipse.jetty.http.HttpParser.fill() (line: 1044)
at org.eclipse.jetty.http.HttpParser.parseNext() (line: 280)
at org.eclipse.jetty.http.HttpParser.parseAvailable() (line: 235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle() (line: 82)
at org.eclipse.jetty.io.nio.SslConnection.handle() (line: 196)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle() (line: 696)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run() (line: 53)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(java.lang.Runnable) (line: 608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() (line: 543)
at java.lang.Thread.run() (line: 745)
有时会在其他Java安全同步的API方法上使用
Thread "qtp438123546-993":
at sun.security.ssl.SignatureAndHashAlgorithm.getSupportedAlgorithms(java.security.AlgorithmConstraints) (line: 155)
at sun.security.ssl.Handshaker.getLocalSupportedSignAlgs() (line: 422)
at sun.security.ssl.ServerHandshaker.clientHello(sun.security.ssl.HandshakeMessage$ClientHello) (line: 700)
at sun.security.ssl.ServerHandshaker.processMessage(byte, int) (line: 213)
at sun.security.ssl.Handshaker.processLoop() (line: 925)
at sun.security.ssl.Handshaker$1.run() (line: 865)
at sun.security.ssl.Handshaker$1.run() (line: 862)
at java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext)
at sun.security.ssl.Handshaker$DelegatedTask.run() (line: 1302)
at org.eclipse.jetty.io.nio.SslConnection.process(org.eclipse.jetty.io.Buffer, org.eclipse.jetty.io.Buffer) (line: 375)
at org.eclipse.jetty.io.nio.SslConnection.access$900(org.eclipse.jetty.io.nio.SslConnection, org.eclipse.jetty.io.Buffer, org.eclipse.jetty.io.Buffer) (line: 48)
at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.fill(org.eclipse.jetty.io.Buffer) (line: 678)
at org.eclipse.jetty.http.HttpParser.fill() (line: 1044)
at org.eclipse.jetty.http.HttpParser.parseNext() (line: 280)
at org.eclipse.jetty.http.HttpParser.parseAvailable() (line: 235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle() (line: 82)
at org.eclipse.jetty.io.nio.SslConnection.handle() (line: 196)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle() (line: 696)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run() (line: 53)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(java.lang.Runnable) (line: 608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() (line: 543)
at java.lang.Thread.run() (line: 745)
一切正常后,在问题出现之前,客户端和服务器之间就建立了具有持久连接的连接池(可以在netstat中看到),但是当问题出现时,存在许多处于不同状态的连接,其他建立的:
在客户端:
tcp6 0 0 client-host:39962 server-host:443 FIN_WAIT2
tcp6 0 1 client-host:45266 server-host:443 SYN_SENT
tcp6 0 0 client-host:46234 server-host:443 FIN_WAIT2
tcp6 0 0 client-host:38892 server-host:443 FIN_WAIT2
tcp6 0 0 client-host:39160 server-host:443 FIN_WAIT2
tcp6 0 0 client-host:39188 server-host:443 FIN_WAIT2
tcp6 0 1 client-host:43496 server-host:443 SYN_SENT
tcp6 0 0 client-host:46122 server-host:443 FIN_WAIT2
tcp6 0 1 client-host:44938 server-host:443 SYN_SENT
tcp6 0 0 client-host:46446 server-host:443 ESTABLISHED
在服务器端:
tcp 0 2980 server-host:443 client-host-1:34964 LAST_ACK
tcp 0 2980 server-host:443 client-host-3:52430 LAST_ACK
tcp 0 2980 server-host:443 client-host-1:35922 LAST_ACK
tcp 0 0 server-host:443 client-host-1:38362 CLOSE_WAIT
tcp 236 0 server-host:443 client-host-3:58296 CLOSE_WAIT
tcp 0 2980 server-host:443 client-host-1:34980 LAST_ACK
tcp 0 2980 server-host:443 client-host-2:55748 LAST_ACK
tcp 0 2980 server-host:443 client-host-3:53376 LAST_ACK
tcp 0 0 server-host:443 client-host-1:40104 SYN_RECV
tcp 0 0 server-host:443 client-host-1:38718 CLOSE_WAIT
tcp 0 2980 server-host:443 client-host-2:54142 LAST_ACK
tcp 0 2980 server-host:443 client-host-1:50766 LAST_ACK
tcp 0 0 server-host:443 client-host-1:38604 CLOSE_WAIT
tcp 236 0 server-host:443 client-host-3:57604 CLOSE_WAIT
tcp 0 2980 server-host:443 client-host-2:55502 LAST_ACK
tcp 0 2980 server-host:443 client-host-2:58254 LAST_ACK
tcp 0 2980 server-host:443 client-host-1:38042 LAST_ACK
tcp 0 0 server-host:443 client-host-1:38222 CLOSE_WAIT
tcp 0 2980 server-host:443 client-host-3:47812 LAST_ACK
tcp 0 2980 server-host:443 client-host-1:60532 LAST_ACK
tcp 0 2980 server-host:443 client-host-2:54282 LAST_ACK
tcp 0 0 server-host:443 client-host-1:40978 SYN_RECV
几乎所有线程都没有响应,CPU很高,GC一直在工作
我们还在JVM中设置了以下标志:
-Djava.security.egd = file:/ dev /./ urandom
为了使SecureRandom处于非阻塞状态(与/ dev / random相反)
java版本“ 1.8.0_05” Java(TM)SE运行时环境(内部版本1.8.0_05-b13) Java HotSpot(TM)64位服务器VM(内部版本25.5-b02,混合模式)
内核:4.14.94-89.73.amzn2.x86_64(但在具有内核2.6.32-696.20.1.el6.x86_64的系统上也会出现此问题)
限制:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 151551
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 16384
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
当组件进入该状态时,它没有响应,日志未写入日志文件(顺便说一句,我们使用log4j2)
当组件上的负载停止时,它需要几分钟的时间才能恢复并重新响应
有人在您的Java后端组件中有类似的行为吗? 请评论/建议调查和/或解决方案的方向
答案 0 :(得分:0)
="https://s3-ap-southeast-2.amazonaws.com/my-bucket/"&A1
尝试从操作系统提供的随机性源(例如SecureRandom
)中读取随机字节,但是,如果系统没有足够的熵可用,则此操作可能会挂起。
一种解决方法是使用另一个不会阻塞的随机性源(例如/dev/random
)。可以通过更新/dev/urandom
来配置:
$JAVA_HOME/jre/lib/security/java.security
另一种选择是安装Haveged
,这可以加快从securerandom.source=file:/dev/./urandom
的读取速度。
此错误报告具有其他详细信息:https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6521844