我是ELK的新手,今天我发现这些日志(这些日志大约有数千页),弹性搜索变得使用了太多的CPU。所以有人可以帮我这个吗?
LOGS:
[2015-06-24 16:16:52,309][WARN ][cluster.action.shard ] [Bereet] [logstash-2015.06.24][0] received shard failed for [logstash-2015.06.24][0], node[ucXcuxuQQTSz_leAzWq6mQ], [P], s[INITIALIZING], indexUUID [ieIR8uWLQHycnEC_szsNZQ], reason [shard failure [failed recovery][IndexShardGatewayRecoveryException[[logstash-2015.06.24][0] failed to recover shard]; nested: TranslogCorruptedException[translog corruption while reading from stream]; nested: ElasticsearchIllegalArgumentException[No version type match [99]]; ]]
[2015-06-24 16:16:52,332][WARN ][cluster.action.shard ] [Bereet] [logstash-2015.06.24][0] received shard failed for [logstash-2015.06.24][0], node[ucXcuxuQQTSz_leAzWq6mQ], [P], s[INITIALIZING], indexUUID [ieIR8uWLQHycnEC_szsNZQ], reason [master [Bereet][ucXcuxuQQTSz_leAzWq6mQ][iZ23cth9hh5Z][inet[/10.162.41.162:9300]] marked shard as initializing, but shard is marked as failed, resend shard failure]
[2015-06-24 16:16:52,339][WARN ][index.engine ] [Bereet] [logstash-2015.06.24][4] failed to sync translog
[2015-06-24 16:16:52,345][WARN ][indices.cluster ] [Bereet] [[logstash-2015.06.24][4]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-2015.06.24][4] failed to recover shard
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:290)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog corruption while reading from stream
at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:72)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:260)
... 4 more
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: No version type match [116]
at org.elasticsearch.index.VersionType.fromValue(VersionType.java:307)
at org.elasticsearch.index.translog.Translog$Create.readFrom(Translog.java:376)
at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:68)
... 5 more
答案 0 :(得分:2)
translog似乎已损坏: TranslogCorruptedException [从流中读取时translog损坏]
我相信如果您只是删除损坏的translog(在节点' / indices / $ {index_name}子目录中),它应解决此特定问题。在删除/修复损坏的translog时可能会发现其他问题。
这是一个可能有用的链接:http://unpunctualprogrammer.com/2014/05/13/corrupt-elasticsearch-translogs/
答案 1 :(得分:0)
对我来说,这是在系统崩溃(磁盘有足够的空间)之后发生的。
现在有一种正式的方法可以使用提供的工具elasticsearch-translog来修复损坏的事务日志,但是您可能会丢失未编制索引的数据,因此我建议备份事务日志(例如,出于合规性的原因;也许有人会付出足够的精力来进行备份)进行分析)。
首先运行以下命令确认问题:
curl -XGET localhost:9200/_cluster/allocation/explain?pretty
找到受影响的碎片的更简单方法:
curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
停止搜索。
假设分片号 2 在索引qABaulDIRJyT06G3rBFfrC中受到影响(您的路径可能有所不同),请运行:
/usr/share/elasticsearch/bin/elasticsearch-translog truncate -d /var/lib/elasticsearch/nodes/0/indices/qABaulDIRJyT06G3rBFfrC/2/translog
确保以root用户身份运行该工具时,新创建的文件属于正确的用户/组:
chown -R elasticsearch:elasticsearch translog*
开始elasticsearch。 最后,如果它停止尝试重用该分片,请运行以下命令以强制elasticsearch解决该问题:
curl -XPOST localhost:9200/_cluster/reroute?retry_failed=true
用于查看未分配的分片的命令应该不再返回结果。