我们在生产环境中运行tomcat 8.0.26。在例行部署之后,我们将流量切换到新部署的tomcat集群。当时每个节点上的rps大约为800。然后我们发现所有节点都运行良好,除了一个接受连接但没有响应的节点。来自客户端的所有连接都挂在那里直到超时。然后我关闭了到该节点的流量,并捕获了一些状态:
只有少数tomcat工作线程,所有这些线程都停在那里等待新任务。
http-nio2-8081-exec-613" #11568 daemon prio=5 os_prio=0 tid=0x00007f710a660800 nid=0xb202 waiting on condition [0x00007f70fee03000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000003c003f008> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:103)
at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:31)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
可以从localhost wget
建立连接,但服务器没有响应。
$ wget localhost:8081
--2015-09-06 19:25:39-- http://localhost:8081/
Resolving localhost... ::1, 127.0.0.1
Connecting to localhost|::1|:8081... connected.
HTTP request sent, awaiting response...
lsof
输出:
$ lsof -n -i tcp:8081
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 34609 web_server 49u IPv6 2195777660 0t0 TCP *:tproxy (LISTEN)
wget 41834 web_server 3u IPv4 2198776305 0t0 TCP 127.0.0.1:37324->127.0.0.1:tproxy (ESTABLISHED)
建立连接后,所有tomcat工作线程仍处于停放状态。
这是我观察到的情况。任何人都可以就此问题向我提供一些暗示吗?