弹性搜索IndexShardGatewayRecoveryException

时间:2015-06-24 08:48:03

标签: elasticsearch logstash kibana

我是ELK的新手,今天我发现这些日志(这些日志大约有数千页),弹性搜索变得使用了太多的CPU。所以有人可以帮我这个吗?

LOGS: [2015-06-24 16:16:52,309][WARN ][cluster.action.shard ] [Bereet] [logstash-2015.06.24][0] received shard failed for [logstash-2015.06.24][0], node[ucXcuxuQQTSz_leAzWq6mQ], [P], s[INITIALIZING], indexUUID [ieIR8uWLQHycnEC_szsNZQ], reason [shard failure [failed recovery][IndexShardGatewayRecoveryException[[logstash-2015.06.24][0] failed to recover shard]; nested: TranslogCorruptedException[translog corruption while reading from stream]; nested: ElasticsearchIllegalArgumentException[No version type match [99]]; ]] [2015-06-24 16:16:52,332][WARN ][cluster.action.shard ] [Bereet] [logstash-2015.06.24][0] received shard failed for [logstash-2015.06.24][0], node[ucXcuxuQQTSz_leAzWq6mQ], [P], s[INITIALIZING], indexUUID [ieIR8uWLQHycnEC_szsNZQ], reason [master [Bereet][ucXcuxuQQTSz_leAzWq6mQ][iZ23cth9hh5Z][inet[/10.162.41.162:9300]] marked shard as initializing, but shard is marked as failed, resend shard failure] [2015-06-24 16:16:52,339][WARN ][index.engine ] [Bereet] [logstash-2015.06.24][4] failed to sync translog [2015-06-24 16:16:52,345][WARN ][indices.cluster ] [Bereet] [[logstash-2015.06.24][4]] marking and sending shard failed due to [failed recovery] org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-2015.06.24][4] failed to recover shard at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:290) at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog corruption while reading from stream at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:72) at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:260) ... 4 more Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: No version type match [116] at org.elasticsearch.index.VersionType.fromValue(VersionType.java:307) at org.elasticsearch.index.translog.Translog$Create.readFrom(Translog.java:376) at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:68) ... 5 more

2 个答案:

答案 0 :(得分:2)

translog似乎已损坏: TranslogCorruptedException [从流中读取时translog损坏]

我相信如果您只是删除损坏的translog(在节点' / indices / $ {index_name}子目录中),它应解决此特定问题。在删除/修复损坏的translog时可能会发现其他问题。

这是一个可能有用的链接:http://unpunctualprogrammer.com/2014/05/13/corrupt-elasticsearch-translogs/

答案 1 :(得分:0)

对我来说,这是在系统崩溃(磁盘有足够的空间)之后发生的。

现在有一种正式的方法可以使用提供的工具elasticsearch-translog来修复损坏的事务日志,但是您可能会丢失未编制索引的数据,因此我建议备份事务日志(例如,出于合规性的原因;也许有人会付出足够的精力来进行备份)进行分析)。

首先运行以下命令确认问题:

curl -XGET localhost:9200/_cluster/allocation/explain?pretty

找到受影响的碎片的更简单方法:

curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED

停止搜索。

假设分片号 2 在索引qABaulDIRJyT06G3rBFfrC中受到影响(您的路径可能有所不同),请运行:

/usr/share/elasticsearch/bin/elasticsearch-translog truncate -d /var/lib/elasticsearch/nodes/0/indices/qABaulDIRJyT06G3rBFfrC/2/translog

确保以root用户身份运行该工具时,新创建的文件属于正确的用户/组:

chown -R elasticsearch:elasticsearch translog*

开始elasticsearch。 最后,如果它停止尝试重用该分片,请运行以下命令以强制elasticsearch解决该问题:

curl -XPOST localhost:9200/_cluster/reroute?retry_failed=true

用于查看未分配的分片的命令应该不再返回结果。