Skywalking中的“ BLOCKED”和“ TIME_WAITING”线程过多

时间:2019-05-14 01:55:55

标签: java elasticsearch

[环境] 1个单个虚拟机(4C / 8GB),并部署了es&skywalking服务器。 独立的单节点elasticsearch,已为jvm分配Xmx4g。 部署在同一台vm上的Skywalk服务器,为jvm分配了Xmx2g。 2个Spring Boot应用程序(部署在另一个vm上)连接到Skywalking服务器。

[症状] 1.重新启动es&skywalking服务器后,skywalking服务器会收到来自应用程序的连续跟踪。 2.持续15-30分钟。之后,不再有任何痕迹进入天行服务器。

[已付出努力] 1.我检查了elasticsearch.log。 elasticsearch.log中没有明显的错误消息。 2.我检查了gc.log。 gc.log中没有gc开销消息。 3.我已经使用jstack打印了stacktrace,日志显示如下:

"DataCarrier.IndicatorPersistentWorker.all_p99.Consumser.0.Thread" #38 daemon prio=5 os_prio=0 tid=0x00007f3e950d8800 nid=0x2eab waiting on condition [0x00007f3dfb7f6000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.skywalking.apm.commons.datacarrier.consumer.ConsumerThread.run(ConsumerThread.java:72)

"DataCarrier.IndicatorPersistentWorker.endpoint_avg.Consumser.0.Thread" #43 daemon prio=5 os_prio=0 tid=0x00007f3e950e2800 nid=0x2eb0 waiting for monitor entry [0x00007f3dfb2f1000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:286)
        - waiting to lock <0x00000000813c8440> (a org.elasticsearch.action.bulk.BulkProcessor)
        at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:271)
        at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:267)
        at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:253)
        at org.apache.skywalking.oap.server.storage.plugin.elasticsearch.base.BatchProcessEsDAO.lambda$batchPersistence$0(BatchProcessEsDAO.java:64)
        at org.apache.skywalking.oap.server.storage.plugin.elasticsearch.base.BatchProcessEsDAO$$Lambda$278/81782502.accept(Unknown Source)
        at java.lang.Iterable.forEach(Iterable.java:75)
        at org.apache.skywalking.oap.server.storage.plugin.elasticsearch.base.BatchProcessEsDAO.batchPersistence(BatchProcessEsDAO.java:62)
        at org.apache.skywalking.oap.server.core.analysis.worker.PersistenceWorker.onWork(PersistenceWorker.java:51)
        at org.apache.skywalking.oap.server.core.analysis.worker.IndicatorPersistentWorker.onWork(IndicatorPersistentWorker.java:63)
        at org.apache.skywalking.oap.server.core.analysis.worker.IndicatorPersistentWorker$PersistentConsumer.consume(IndicatorPersistentWorker.java:153)
        at org.apache.skywalking.apm.commons.datacarrier.consumer.ConsumerThread.consume(ConsumerThread.java:101)
        at org.apache.skywalking.apm.commons.datacarrier.consumer.ConsumerThread.run(ConsumerThread.java:68)

"DataCarrier.IndicatorAggregateWorker.instance_jvm_cpu.Consumser.0.Thread" #152 daemon prio=5 os_prio=0 tid=0x00007f3e951c1000 nid=0x2f1d sleeping[0x00007f3df4584000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.skywalking.apm.commons.datacarrier.buffer.Buffer.save(Buffer.java:64)
        at org.apache.skywalking.apm.commons.datacarrier.buffer.Channels.save(Channels.java:52)
        at org.apache.skywalking.apm.commons.datacarrier.DataCarrier.produce(DataCarrier.java:88)
        at org.apache.skywalking.oap.server.core.analysis.worker.IndicatorPersistentWorker.in(IndicatorPersistentWorker.java:68)
        at org.apache.skywalking.oap.server.core.analysis.worker.IndicatorTransWorker.in(IndicatorTransWorker.java:89)
        at org.apache.skywalking.oap.server.core.analysis.worker.IndicatorTransWorker.in(IndicatorTransWorker.java:32)
        at org.apache.skywalking.oap.server.core.remote.client.SelfRemoteClient.push(SelfRemoteClient.java:55)
        at org.apache.skywalking.oap.server.core.remote.RemoteSenderService.send(RemoteSenderService.java:51)
        at org.apache.skywalking.oap.server.core.analysis.worker.IndicatorRemoteWorker.in(IndicatorRemoteWorker.java:51)
        at org.apache.skywalking.oap.server.core.analysis.worker.IndicatorRemoteWorker.in(IndicatorRemoteWorker.java:33)
        at org.apache.skywalking.oap.server.core.analysis.worker.IndicatorAggregateWorker.lambda$sendToNext$0(IndicatorAggregateWorker.java:93)
        at org.apache.skywalking.oap.server.core.analysis.worker.IndicatorAggregateWorker$$Lambda$203/1364775767.accept(Unknown Source)
        at java.util.HashMap$Values.forEach(HashMap.java:981)
        at org.apache.skywalking.oap.server.core.analysis.worker.IndicatorAggregateWorker.sendToNext(IndicatorAggregateWorker.java:88)
        at org.apache.skywalking.oap.server.core.analysis.worker.IndicatorAggregateWorker.onWork(IndicatorAggregateWorker.java:73)
        at org.apache.skywalking.oap.server.core.analysis.worker.IndicatorAggregateWorker.access$100(IndicatorAggregateWorker.java:38)
        at org.apache.skywalking.oap.server.core.analysis.worker.IndicatorAggregateWorker$AggregatorConsumer.consume(IndicatorAggregateWorker.java:131)
        at org.apache.skywalking.apm.commons.datacarrier.consumer.ConsumerThread.consume(ConsumerThread.java:101)
        at org.apache.skywalking.apm.commons.datacarrier.consumer.ConsumerThread.run(ConsumerThread.java:68)

jstack日志中有许多“ TIMED WAITING”和“ BLOCKED”线程。

  1. 我已经检查了es群集状态和节点状态。 堆百分比在正常范围内,群集保持绿色。

[目标] 我想知道您是否可以提供一些提示: -如果建议使用硬件进行空中行走部署。像推荐1C / 2GB那样用于单个应用程序跟踪。 -如果有任何日志或故障排除方法,我可以采用它来更深入地研究问题。

0 个答案:

没有答案