我的风暴拓扑运行一段时间(比方说18h,21h)然后进入空闲状态。该拓扑没有响应kafka发送的消息。我已经看过日志,我无法找出发生了什么?
当拓扑未响应Kafka消息时,我收到日志:
.ZkCoordinator [INFO]任务[7/8]刷新分区管理器 连接 2016-03-30 02:59:15 s.k.DynamicBrokersReader [INFO]从zookeeper读取分区信息: GlobalPartitionInformation {partitionMap = {0 = IP:6667,1 = IP:6667, 2 = IP:6667,3 = IP:6667,4 = IP:6667,5 = IP:6667,6 = IP:6667,7 = IP:6667}} 2016-03-30 02:59:15 s.k.KafkaUtils [INFO]任务[7/8]已分配[Partition {host = IP:6667,partition = 6}] 2016-03-30 02:59:15 s.k.ZkCoordinator [INFO]任务[7/8]已删除分区管理员:[] 2016-03-30 02:59:15 s.k.ZkCoordinator [INFO]任务[7/8]新的分区经理:[] 2016-03-30 02:59:15 s.k.ZkCoordinator [INFO]任务[7/8]完成令人耳目一新 2016-03-30 03:01:15 s.k.ZkCoordinator [INFO]任务[7/8]刷新分区管理器连接 2016-03-30 03:01:15 s.k.DynamicBrokersReader [INFO]从zookeeper读取分区信息: GlobalPartitionInformation {partitionMap = {0 = IP:6667,1 = IP:6667, 2 = IP:6667,3 = IP:6667,4 = IP:6667,5 = IP:6667,6 = IP:6667} 2016-03-30 03:01:15 s.k.KafkaUtils [INFO]任务[7/8]已分配[Partition {host = IP:6667,partition = 6}] 2016-03-30 03:01:15 s.k.ZkCoordinator [INFO]任务[7/8]删除分区管理员:[] 2016-03-30 03:01:15 s.k.ZkCoordinator [INFO]任务[7/8]新的分区经理:[] 2016-03-30 03:01:15 s.k.ZkCoordinator [INFO]任务[7/8]完成令人耳目一新 2016-03-30 03:03:15 s.k.ZkCoordinator [INFO]任务[7/8]刷新分区管理器连接 2016-03-30 03:03:15 s.k.DynamicBrokersReader [INFO]从zookeeper读取分区信息:GlobalPartitionInformation {partitionMap =
{0 = IP:6667,1 = IP:6667,2 = IP:6667,3 = IP:6667,4 = IP:6667,5 = IP:6667, 6 = IP} 2016-03-30 03:03:15 s.k.KafkaUtils [INFO]任务[7/8]已分配[Partition {host = IP:6667,partition = 6}] 2016-03-30 03:03:15 s.k.ZkCoordinator [INFO]任务[7/8]已删除分区管理员:[] 2016-03-30 03:03:15 s.k.ZkCoordinator [INFO]任务[7/8]新的分区经理:[] 2016-03-30 03:03:15 s.k.ZkCoordinator [INFO]任务[7/8]完成令人耳目一新 2016-03-30 03:05:15 s.k.ZkCoordinator [INFO]任务[7/8]刷新分区管理器连接
如何追踪问题?
下面是其中一个Pid的ThreadDump
Thread 26783: (state = BLOCKED)
- sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
- java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) @bci=20, line=226 (Compiled frame)
- java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long) @bci=68, line=2082 (Compiled frame)
- java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() @bci=122, line=1090 (Compiled frame)
- java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() @bci=1, line=807 (Compiled frame)
- java.util.concurrent.ThreadPoolExecutor.getTask() @bci=156, line=1068 (Compiled frame)
- java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=26, line=1130 (Compiled
- java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
Thread 26776: (state = BLOCKED)
- sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
- java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Compiled frame)
- java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=2043 (Compiled frame)
- java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() @bci=98, line=1085 (Compiled frame)
- java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() @bci=1, line=807 (Compiled frame)
- java.util.concurrent.ThreadPoolExecutor.getTask() @bci=156, line=1068 (Compiled frame)
- java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=26, line=1130 (Interpreted
- java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
Thread 26769: (state = BLOCKED)
- sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
- java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame)
- java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=2043 (Interpreted frame)
- java.util.concurrent.DelayQueue.take() @bci=28, line=209 (Interpreted frame)
- java.util.concurrent.DelayQueue.take() @bci=1, line=68 (Interpreted frame)
- org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop() @bci=10, line=781 (Interpreted frame)
- org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(org.apache.curator.framework.imps.CuratorFrameworkImpl) line=57 (Interpreted frame)
- org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call() @bci=4, line=275 (Interpreted frame)