Question

我有一个使用Netty实现的二进制协议，正在进行性能测试，而JVM正在崩溃以下报告。我不知道如何重复崩溃，但它确实经常发生并且只在重负荷下发生。我有以下依赖项：

java 7.0_51-b13
netty 4.0.18_Final
fedora 20

似乎数组副本发生在nioEventLoopGroup线程中。我正在运行的性能测试是在大约50个TCP连接上发送大量小消息。其中大量的每个连接大约有100万个200字节的消息。每条消息都有2条回送的响应消息。

这就是我正在创建的Netty：

自举：

m_serverBootstrap.group(m_eventLoopGroup)
 .channel(NioServerSocketChannel.class)
 .localAddress(m_config.getSmppPort())
 .childAttr(InternalAttributeKeys.METRICS, m_metricRegistry)
 .childHandler(new CustomServerChannelInitializer());

m_serverBindChannelFuture = m_serverBootstrap.bind().sync();

CustomerServerChannelInitializer

protected void initChannel(SocketChannel ch) throws Exception {
    log.info("initChannel(SocketChannel ch) {} {} ",ch,this);
    ch.pipeline()
    .addLast(new IpFilterHandler())
    .addLast(new ProtocolEncoder())
    .addLast(new LengthFieldBasedFrameDecoder(4 * 1024, 0, 4, -4, 0))
    .addLast(new ProtocolDecoder())
    .addLast(new WindowingHandler())
    .addLast(new SequenceNumberAssignmentHandler())
    .addLast("idleState", new IdleStateHandler(idleTime, idleTime, idleTime))
    .addLast("idleDisconnect", m_idleDisconnectHandler)
    .addLast("auth", m_authHandler)
    .addLast("catchall", new CatchallHandler(false));
    ch.config().setAllocator(PooledByteBufAllocator.DEFAULT);
    ch.config().setAutoRead(true);
    log.info("finished initChannel(SocketChannel ch) {} {} ",ch,this);
}

初始连接后，管道在authHandler

中再次被更改

@Override
protected void channelRead0(ChannelHandlerContext ctx, CustomMessage msg) throws Exception {
    ResponseMessage response = auth(msg,ctx);
    ctx.pipeline().replace("auth", "msghandler", new MessageHandler());
    ctx.pipeline().replace("idleState", "inactivityPeriod", new IdleStateHandler());
    ctx.pipeline().addAfter("msghandler", "responsehandler", new ResponseHandler());
    ctx.pipeline().addAfter("responsehandler", "heartbeat", new HeartbeatHandler());
    ctx.pipeline().addAfter("heartbeat", "disconnect", new DisconnectHandler());
    ctx.channel().closeFuture().addListener(new CleanupChannelFutureListener(ctx));
    ctx.writeAndFlush(response);
}

jvm报道。我有详细的报告，如果它有帮助http://pastebin.com/RV0KqPMf 如果详细报告中的JMX线程困扰着你，我可以在没有它们的情况下重现这个问题。

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ffa9eb18eaa, pid=1731, tid=140710808540928
#
# JRE version: Java(TM) SE Runtime Environment (7.0_51-b13) (build 1.7.0_51-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# v  ~StubRoutines::jbyte_disjoint_arraycopy
#
# Core dump written. Default location: /home/user/dir/core or core.1731
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x00007ff9fc06f800):  JavaThread "nioEventLoopGroup-2-12" [_thread_in_Java, id=1912, stack(0x00007ff9c9b25000,0x00007ff9c9c26000)]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x00007ff987df7715

找出JVM中导致此SIGSEGV的原因的最佳方法是什么？

Answer 1

这绝对是一个Netty错误。

Netty 4.x大量使用Unsafe API - 允许原始内存访问的Oracle JDK内部API 请参阅Netty来源的PlatformDependent0.java。

崩溃日志告诉问题发生在Unsafe.copyMemory调用内，其中目标是Java Heap年轻代中的byte[]数组，源指向未映射的内存区域。很可能这是由尝试从先前已释放的本机缓冲区获取字节引起的。在Unsafe API中没有健全性检查，因此任何滥用通常都会导致JVM崩溃。

Answer 2

从Netty 4.0.18.Final升级到4.0.20.Final解决了这个问题。

Netty 4，Java 7 JVM SIGSEGV在负载下崩溃

2 个答案: