Java Security API同步方法导致应用程序线程在高负载下挂起

时间:2019-05-06 21:09:07

标签: multithreading ssl jetty secure-random end-of-life

我们的后端服务器使用Embedded Jetty 8.1.15已有几年了。直到最近,当我们开始对很多并发用户进行负载测试时,它才没有任何问题。然后,即使使用JMeter(HTTP采样器的并发池数为1000和KeepAlive),即使用户数量很少,我们也可以成功重现该问题。客户端和服务器之间的通信通过TLS(客户端具有连接池) 客户端(连接池)-> TLS->服务器 我们看到的行为-在某些时间点,许多线程被“卡死”(在监视器上以不同的方法等待),并带有以下堆栈跟踪-

Thread "qtp438123546-99":
at java.security.SecureRandom.nextBytes(byte[ ]) (line: 457)
at sun.security.ssl.RandomCookie.<init>(java.security.SecureRandom) (line: 53)
at sun.security.ssl.ServerHandshaker.clientHello(sun.security.ssl.HandshakeMessage$ClientHello) (line: 522)
at sun.security.ssl.ServerHandshaker.processMessage(byte, int) (line: 213)
at sun.security.ssl.Handshaker.processLoop() (line: 925)
at sun.security.ssl.Handshaker$1.run() (line: 865)
at sun.security.ssl.Handshaker$1.run() (line: 862)
at java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext)
at sun.security.ssl.Handshaker$DelegatedTask.run() (line: 1302)
at org.eclipse.jetty.io.nio.SslConnection.process(org.eclipse.jetty.io.Buffer, org.eclipse.jetty.io.Buffer) (line: 375)
at org.eclipse.jetty.io.nio.SslConnection.access$900(org.eclipse.jetty.io.nio.SslConnection, org.eclipse.jetty.io.Buffer, org.eclipse.jetty.io.Buffer) (line: 48)
at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.fill(org.eclipse.jetty.io.Buffer) (line: 678)
at org.eclipse.jetty.http.HttpParser.fill() (line: 1044)
at org.eclipse.jetty.http.HttpParser.parseNext() (line: 280)
at org.eclipse.jetty.http.HttpParser.parseAvailable() (line: 235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle() (line: 82)
at org.eclipse.jetty.io.nio.SslConnection.handle() (line: 196)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle() (line: 696)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run() (line: 53)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(java.lang.Runnable) (line: 608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() (line: 543)
at java.lang.Thread.run() (line: 745)

有时会在其他Java安全同步的API方法上使用

Thread "qtp438123546-993":
at sun.security.ssl.SignatureAndHashAlgorithm.getSupportedAlgorithms(java.security.AlgorithmConstraints) (line: 155)
at sun.security.ssl.Handshaker.getLocalSupportedSignAlgs() (line: 422)
at sun.security.ssl.ServerHandshaker.clientHello(sun.security.ssl.HandshakeMessage$ClientHello) (line: 700)
at sun.security.ssl.ServerHandshaker.processMessage(byte, int) (line: 213)
at sun.security.ssl.Handshaker.processLoop() (line: 925)
at sun.security.ssl.Handshaker$1.run() (line: 865)
at sun.security.ssl.Handshaker$1.run() (line: 862)
at java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext)
at sun.security.ssl.Handshaker$DelegatedTask.run() (line: 1302)
at org.eclipse.jetty.io.nio.SslConnection.process(org.eclipse.jetty.io.Buffer, org.eclipse.jetty.io.Buffer) (line: 375)
at org.eclipse.jetty.io.nio.SslConnection.access$900(org.eclipse.jetty.io.nio.SslConnection, org.eclipse.jetty.io.Buffer, org.eclipse.jetty.io.Buffer) (line: 48)
at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.fill(org.eclipse.jetty.io.Buffer) (line: 678)
at org.eclipse.jetty.http.HttpParser.fill() (line: 1044)
at org.eclipse.jetty.http.HttpParser.parseNext() (line: 280)
at org.eclipse.jetty.http.HttpParser.parseAvailable() (line: 235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle() (line: 82)
at org.eclipse.jetty.io.nio.SslConnection.handle() (line: 196)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle() (line: 696)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run() (line: 53)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(java.lang.Runnable) (line: 608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() (line: 543)
at java.lang.Thread.run() (line: 745)

一切正常后,在问题出现之前,客户端和服务器之间就建立了具有持久连接的连接池(可以在netstat中看到),但是当问题出现时,存在许多处于不同状态的连接,其他建立的:

在客户端:

tcp6       0      0 client-host:39962       server-host:443         FIN_WAIT2
tcp6       0      1 client-host:45266       server-host:443         SYN_SENT
tcp6       0      0 client-host:46234       server-host:443         FIN_WAIT2
tcp6       0      0 client-host:38892       server-host:443         FIN_WAIT2
tcp6       0      0 client-host:39160       server-host:443         FIN_WAIT2
tcp6       0      0 client-host:39188       server-host:443         FIN_WAIT2
tcp6       0      1 client-host:43496       server-host:443         SYN_SENT
tcp6       0      0 client-host:46122       server-host:443         FIN_WAIT2
tcp6       0      1 client-host:44938       server-host:443         SYN_SENT
tcp6       0      0 client-host:46446       server-host:443         ESTABLISHED

在服务器端:

tcp        0   2980 server-host:443         client-host-1:34964       LAST_ACK
tcp        0   2980 server-host:443         client-host-3:52430       LAST_ACK
tcp        0   2980 server-host:443         client-host-1:35922       LAST_ACK
tcp        0      0 server-host:443         client-host-1:38362       CLOSE_WAIT
tcp      236      0 server-host:443         client-host-3:58296       CLOSE_WAIT
tcp        0   2980 server-host:443         client-host-1:34980       LAST_ACK
tcp        0   2980 server-host:443         client-host-2:55748       LAST_ACK
tcp        0   2980 server-host:443         client-host-3:53376       LAST_ACK
tcp        0      0 server-host:443         client-host-1:40104       SYN_RECV
tcp        0      0 server-host:443         client-host-1:38718       CLOSE_WAIT
tcp        0   2980 server-host:443         client-host-2:54142       LAST_ACK
tcp        0   2980 server-host:443         client-host-1:50766       LAST_ACK
tcp        0      0 server-host:443         client-host-1:38604       CLOSE_WAIT
tcp      236      0 server-host:443         client-host-3:57604       CLOSE_WAIT
tcp        0   2980 server-host:443         client-host-2:55502       LAST_ACK
tcp        0   2980 server-host:443         client-host-2:58254       LAST_ACK
tcp        0   2980 server-host:443         client-host-1:38042       LAST_ACK
tcp        0      0 server-host:443         client-host-1:38222       CLOSE_WAIT
tcp        0   2980 server-host:443         client-host-3:47812       LAST_ACK
tcp        0   2980 server-host:443         client-host-1:60532       LAST_ACK
tcp        0   2980 server-host:443         client-host-2:54282       LAST_ACK
tcp        0      0 server-host:443         client-host-1:40978       SYN_RECV

几乎所有线程都没有响应,CPU很高,GC一直在工作

enter image description here

enter image description here

enter image description here

我们还在JVM中设置了以下标志:

-Djava.security.egd = file:/ dev /./ urandom

为了使SecureRandom处于非阻塞状态(与/ dev / random相反)

java版本“ 1.8.0_05” Java(TM)SE运行时环境(内部版本1.8.0_05-b13) Java HotSpot(TM)64位服务器VM(内部版本25.5-b02,混合模式)

内核:4.14.94-89.73.amzn2.x86_64(但在具有内核2.6.32-696.20.1.el6.x86_64的系统上也会出现此问题)

限制:

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 151551
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 16384
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

当组件进入该状态时,它没有响应,日志未写入日志文件(顺便说一句,我们使用log4j2)

当组件上的负载停止时,它需要几分钟的时间才能恢复并重新响应

有人在您的Java后端组件中有类似的行为吗? 请评论/建议调查和/或解决方案的方向

1 个答案:

答案 0 :(得分:0)

="https://s3-ap-southeast-2.amazonaws.com/my-bucket/"&A1 尝试从操作系统提供的随机性源(例如SecureRandom)中读取随机字节,但是,如果系统没有足够的熵可用,则此操作可能会挂起。

一种解决方法是使用另一个不会阻塞的随机性源(例如/dev/random)。可以通过更新/dev/urandom来配置:

$JAVA_HOME/jre/lib/security/java.security

另一种选择是安装Haveged,这可以加快从securerandom.source=file:/dev/./urandom 的读取速度。

此错误报告具有其他详细信息:https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6521844