其行为与此question相同,JVM无限期地试图达到安全点,但是在jstack -F
之后,JVM继续执行。我们的Linux服务器版本是CentOS 7.5.1804,内核为3.10.0-123.9.3.el7.x86_64,似乎不在受影响的版本中。
最近这种行为出现了两次,过去还可以。我找不到原因。
GC日志:
2019-01-19T07:36:30.567+0800: 141192.890: Application time: 1.2990830 seconds
2019-01-19T07:36:30.569+0800: 141192.893: Total time for which application threads were stopped: 0.0022680 seconds
2019-01-19T07:36:30.570+0800: 141192.894: Application time: 0.0008920 seconds
2019-01-19T07:36:30.571+0800: 141192.895: Total time for which application threads were stopped: 0.0015040 seconds
2019-01-19T07:36:35.078+0800: 141197.402: Application time: 4.5068430 seconds
############ After jstack -F ###########
2019-01-19T13:47:33.187+0800: 163455.511: Total time for which application threads were stopped: 22258.1091890 seconds
2019-01-19T13:47:33.188+0800: 163455.512: Application time: 0.0005320 seconds
2019-01-19T13:47:33.189+0800: 163455.513: Total time for which application threads were stopped: 0.0014990 seconds
2019-01-19T13:47:33.189+0800: 163455.513: Application time: 0.0000790 seconds
2019-01-19T13:47:33.191+0800: 163455.515: [Full GC2019-01-19T13:47:33.191+0800: 163455.515: [CMS: 446763K->442340K(7707072K), 1.0334000 secs] 587810K->442340K(8320512K), [CMS Perm : 88142K->88142K(131072K)], 1.0341410 secs] [Times: user=1.14 sys=0.00, real=1.04 secs]
2019-01-19T13:47:34.225+0800: 163456.549: Total time for which application threads were stopped: 1.0358560 seconds
安全点统计日志:
# SafepointSynchronize::begin: Timeout detected:
# SafepointSynchronize::begin: Timed out while waiting for threads to stop.
# SafepointSynchronize::begin: Threads which did not reach the safepoint:
# "NettyClientWorkerThread_2" daemon prio=10 tid=0x00007fec701fb000 nid=0x1d0c runnable [0x00007feb48a52000]
java.lang.Thread.State: RUNNABLE
# "NioSocketAcceptor-34" prio=10 tid=0x00007fec50014800 nid=0x1bac runnable [0x00007feb5c58c000]
java.lang.Thread.State: RUNNABLE
# "TP-Processor4" daemon prio=10 tid=0x00007fec987ac000 nid=0x1aba runnable [0x00007feb6a3ce000]
java.lang.Thread.State: RUNNABLE
# "http-8050-Acceptor-0" daemon prio=10 tid=0x00007fec9867b000 nid=0x1ab1 runnable [0x00007fec4416a000]
java.lang.Thread.State: RUNNABLE
# "MasterListener-cmpp3-[192.168.11.11:26383]" daemon prio=10 tid=0x00007fec99b06000 nid=0x1a83 runnable [0x00007feb6d696000]
java.lang.Thread.State: RUNNABLE
...... (lots of jredis thread and memcached thread.)
############ After jstack -F ###########
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
141189.062: RevokeBias [ 1421 0 0 ] [ 0 0 0 1 0 ] 0
141189.609: RevokeBias [ 1421 0 0 ] [ 0 0 0 1 0 ] 0
141191.594: RevokeBias [ 1421 0 0 ] [ 0 0 0 1 0 ] 0
141192.891: RevokeBias [ 1421 0 0 ] [ 0 0 0 1 0 ] 0
141192.891: RevokeBias [ 1421 0 0 ] [ 0 0 0 1 0 ] 0
141197.406: RevokeBias [ 1421 0 1 ] [ 0 22258106 22258107 1 0 ] 0
163455.516: RevokeBias [ 1421 0 0 ] [ 0 0 0 1 0 ] 0
163455.516: GenCollectFull [ 1421 0 2 ] [ 0 0 0 0 1034 ] 0
163456.547: RevokeBias [ 1421 0 0 ] [ 0 0 0 1 0 ] 0
163456.547: RevokeBias [ 1421 0 1 ] [ 0 0 0 1 0 ] 0
我发现所有安全点超时线程都是IN_NATIVE(jstack -F)。例如:
Thread 6787: (state = IN_NATIVE)
- java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) @bci=0 (Compiled frame; information may be imprecise)
- java.net.SocketInputStream.read(byte[], int, int, int) @bci=87, line=152 (Compiled frame)
- java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=122 (Compiled frame)
- java.net.SocketInputStream.read(byte[]) @bci=5, line=108 (Compiled frame)
- redis.clients.util.RedisInputStream.ensureFill() @bci=20, line=196 (Compiled frame)
- redis.clients.util.RedisInputStream.readByte() @bci=1, line=40 (Compiled frame)
- redis.clients.jedis.Protocol.process(redis.clients.util.RedisInputStream) @bci=1, line=151 (Compiled frame)
- redis.clients.jedis.Protocol.read(redis.clients.util.RedisInputStream) @bci=1, line=215 (Interpreted frame)
- redis.clients.jedis.Connection.readProtocolWithCheckingBroken() @bci=4, line=340 (Interpreted frame)
- redis.clients.jedis.Connection.getRawObjectMultiBulkReply() @bci=1, line=285 (Interpreted frame)
- redis.clients.jedis.JedisPubSub.process(redis.clients.jedis.Client) @bci=1, line=121 (Interpreted frame)
- redis.clients.jedis.JedisPubSub.proceed(redis.clients.jedis.Client, java.lang.String[]) @bci=16, line=115 (Interpreted frame)
- redis.clients.jedis.Jedis.subscribe(redis.clients.jedis.JedisPubSub, java.lang.String[]) @bci=13, line=2680 (Interpreted frame)
- redis.clients.jedis.JedisSentinelPool$MasterListener.run() @bci=78, line=291 (Interpreted frame)
但是据我所知,VM线程不等待本地线程阻塞。
答案 0 :(得分:0)
我从http://vault.centos.org/7.0.1406/updates/Source/SPackages/下载了内核源代码文件(kernel-3.10.0-123.9.3.el7.src.rpm),发现那里有futex_wait错误。
static void get_futex_key_refs(union futex_key *key)
{
if (!key->both.ptr)
return;
switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
case FUT_OFF_INODE:
ihold(key->shared.inode); /* implies MB (B) */
break;
case FUT_OFF_MMSHARED:
futex_get_mm(key); /* implies MB (B) */
break;
}
}