Java Web应用程序使用100%的单CPU核心

时间:2015-03-19 14:54:50

标签: java multithreading performance tomcat cpu

我们使用Spring 3.2.5并在Tomcat 7上运行Java应用程序。

由于某些未知原因,它在过去几天的正确行为后开始使用100%的单核(这是每台2个CPU 8核的机器),而且在一段时间内它的整体CPU使用率是高,但蔓延到核心。

我们开始从GC点调查它 - 没有运气,尝试更多/更少的内存,不同的GC类型和配置没有帮助。

当它使用100%的单核时,我们设法用jstack来破坏线程,有很多这样的线程:

Thread 11260: (state = BLOCKED)
 - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Compiled frame; information may be imprecise)

Thread 11375: (state = BLOCKED)
 - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Compiled frame; information may be imprecise)
 - sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=269 (Compiled frame)

Thread 11421: (state = BLOCKED)
 - com.mysql.jdbc.Field.getStringFromBytes(int, int) @bci=144, line=719 (Compiled frame)

我们开始调查网络流量 - 但这种情况并非如此。

我们还可以调查什么?什么可以导致CPU的这种奇怪的使用 - 虽然它消耗所有单核,但应用程序没有响应。

没有线程处于RUNNING状态,但有很多:

Thread 12947: (state = IN_NATIVE)
 - java.net.PlainSocketImpl.$$YJP$$socketAccept(java.net.SocketImpl) @bci=0 (Interpreted frame)
 - java.net.PlainSocketImpl.socketAccept(java.net.SocketImpl) @bci=8 (Interpreted frame)
 - java.net.AbstractPlainSocketImpl.accept(java.net.SocketImpl) @bci=13, line=398 (Interpreted frame)
 - java.net.ServerSocket.implAccept(java.net.Socket) @bci=111, line=530 (Interpreted frame)
 - java.net.ServerSocket.accept() @bci=119, line=498 (Interpreted frame)
 - org.apache.catalina.core.StandardServer.await() @bci=269, line=453 (Interpreted frame)
 - org.apache.catalina.startup.Catalina.await() @bci=10, line=777 (Interpreted frame)
 - org.apache.catalina.startup.Catalina.start() @bci=272, line=723 (Interpreted frame)
 - sun.reflect.NativeMethodAccessorImpl.invoke0(java.lang.reflect.Method, java.lang.Object, java.lang.Object[]) @bci=0 (Interpreted frame)
 - sun.reflect.NativeMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=118, line=57 (Interpreted frame)
 - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=12, line=43 (Interpreted frame)
 - java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=63, line=606 (Interpreted frame)
 - org.apache.catalina.startup.Bootstrap.start() @bci=43, line=321 (Interpreted frame)
 - org.apache.catalina.startup.Bootstrap.main(java.lang.String[]) @bci=187, line=455 (Interpreted frame)

$$ YJP $$来自YK profiler。

在其他机器上,遇到同样的问题我再次找到了497个线程中的345个IN_NATIVE:

Thread 4979: (state = IN_NATIVE)
 - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Compiled frame; information may be imprecise)
 - sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=269 (Compiled frame)
 - sun.nio.ch.EPollSelectorImpl.doSelect(long) @bci=28, line=79 (Compiled frame)
 - sun.nio.ch.SelectorImpl.lockAndDoSelect(long) @bci=37, line=87 (Compiled frame)
 - sun.nio.ch.SelectorImpl.select(long) @bci=30, line=98 (Compiled frame)
 - org.jboss.netty.channel.socket.nio.SelectorUtil.select(java.nio.channels.Selector) @bci=4, line=68 (Compiled frame)
 - org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(java.nio.channels.Selector) @bci=1, line=415 (Compiled frame)
 - org.jboss.netty.channel.socket.nio.AbstractNioSelector.run() @bci=56, line=212 (Compiled frame)
 - org.jboss.netty.channel.socket.nio.AbstractNioWorker.run() @bci=1, line=89 (Interpreted frame)
 - org.jboss.netty.channel.socket.nio.NioWorker.run() @bci=1, line=178 (Interpreted frame)
 - org.jboss.netty.util.ThreadRenamingRunnable.run() @bci=55, line=108 (Interpreted frame)
 - org.jboss.netty.util.internal.DeadLockProofWorker$1.run() @bci=14, line=42 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

我发现了一个:

Thread 13591: (state = IN_JAVA)
 - org.hibernate.engine.Cascade.cascadeAssociation(java.lang.Object, org.hibernate.type.Type, org.hibernate.engine.CascadeStyle, java.lang.Object, boolean) @bci=16, line=239 (Compiled frame; information may be imprecise)
 - org.hibernate.engine.Cascade.cascadeProperty(java.lang.Object, org.hibernate.type.Type, org.hibernate.engine.CascadeStyle, java.lang.Object, boolean) @bci=42, line=193 (Compiled frame)
 - org.hibernate.engine.Cascade.cascade(org.hibernate.persister.entity.EntityPersister, java.lang.Object, java.lang.Object) @bci=224, line=154 (Compiled frame)
 - org.hibernate.event.def.AbstractFlushingEventListener.cascadeOnFlush(org.hibernate.event.EventSource, org.hibernate.persister.entity.EntityPersister, java.lang.Object, java.lang.Object) @bci=60, line=154 (Compiled frame)
 - org.hibernate.event.def.AbstractFlushingEventListener.prepareEntityFlushes(org.hibernate.event.EventSource) @bci=106, line=145 (Compiled frame)
 - org.hibernate.event.def.AbstractFlushingEventListener.flushEverythingToExecutions(org.hibernate.event.FlushEvent) @bci=79, line=88 (Compiled frame)
 - org.hibernate.event.def.DefaultAutoFlushEventListener.onAutoFlush(org.hibernate.event.AutoFlushEvent) @bci=31, line=58 (Compiled frame)
 - org.hibernate.impl.SessionImpl.autoFlushIfRequired(java.util.Set) @bci=83, line=997 (Compiled frame)
 - org.hibernate.impl.SessionImpl.list(java.lang.String, org.hibernate.engine.QueryParameters) @bci=55, line=1149 (Compiled frame)
 - org.hibernate.impl.QueryImpl.list() @bci=33, line=102 (Compiled frame)
 - org.hibernate.impl.AbstractQueryImpl.uniqueResult() @bci=7, line=835 (Compiled frame)

我们可能会遗漏一些资源,但这种情况同时发生在 - 服务器负载较高和较低,以及不同时间。

1 个答案:

答案 0 :(得分:1)

我希望这可能已经解决,如果不是我想建议以下步骤找出高CPU利用率的原因

  1. 使用OS级别命令查找哪个线程占用高CPU,例如使用H键按下TOP或-H命令行选项(顶部-H -p),对于带有F5的AIX htop,对于Windows使用perfmon或使用Process explorer(https://technet.microsoft.com/en-us/sysinternals/bb896653.aspx

  2. 识别导致高CPU和注意线程ID的线程(可能是十六进制)

  3. 进行线程转储并找到备注线程的线程堆栈

  4. 尝试了解所有调用的方法以及哪种方法是CPU密集型方法。

  5. 可以在JVisual VM Sampler或JRofiler中轻松监控。