Question

我们有一个3节点（c4.large）ElasticSearch 2.1.0集群（由节点组成，我称之为es-live-0，es-live-1，es-live- 2）在EC2 AWS上进行设置，它一直运行良好，并提供webapp。

其中一个节点还托管一个kibana，它收集并显示marvel-agent发送给它的数据。

昨天，marvel-agent无法创建新的奇迹数据索引，并且失败了，下面提供了一些示例日志。随后集群发生故障，但在大约20分钟后其状态恢复到绿色。但是，今天早上抵达办公室后，我发现这些都是谎言！我们的webapp请求已超时，即使它在EC2的监控仪表板上看起来很好，我也无法进入es-live-0。重新启动修复此问题，但看到这是我们的生产系统，我真的想深究这一点。

在阅读这个帖子：https://discuss.elastic.co/t/marvel-high-index-rate/38935/4后，我意识到我们应该将kibana移动到独立节点并将奇迹数据发送到在其上运行的ElasticSearch。 这可能是潜在的问题吗？为了给你一个想法，我们有超过23个分片的webapp使用了3个主要索引。系统上的分片总数为145，其中大部分与奇迹数据有关。 与此同时，感觉大量的分片不应该使其中一个节点无响应，或者假设这样做我错了吗？

此外，如果其中一个节点没有响应，为什么群集没有弹出它并继续作为双节点设置？

示例日志：

Feb 02 00:02:13 es-live-1 elasticsearch-live.log:  [2016-02-02 00:02:17,006][ERROR][marvel.agent             ] [Joshua Guthrie] background thread had an uncaught exception
Feb 02 00:02:13 es-live-1 elasticsearch-live.log:  ElasticsearchException[failed to flush exporter bulks]
Feb 02 00:02:13 es-live-1 elasticsearch-live.log:   at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:104)
Feb 02 00:02:13 es-live-1 elasticsearch-live.log:   at org.elasticsearch.marvel.agent.exporter.ExportBulk.close(ExportBulk.java:53)
Feb 02 00:02:13 es-live-1 elasticsearch-live.log:   at org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:201)
Feb 02 00:02:13 es-live-1 elasticsearch-live.log:   at java.lang.Thread.run(Thread.java:745)
Feb 02 00:02:13 es-live-1 elasticsearch-live.log:   Suppressed: ElasticsearchException[failed to flush [default_local] exporter bulk]; nested: ElasticsearchException[failure in bulk execution:
Feb 02 00:02:13 es-live-1 elasticsearch-live.log:  [0]: index [.marvel-es-2016.02.02], type [node_stats], id [null], message [RemoteTransportException[[Corruptor][es-live-0:9300][indices:admin/create]]; nested: ProcessClusterEventTimeoutException[failed to process cluster event (acquire index lock) within 1m];]];
Feb 02 00:02:13 es-live-1 elasticsearch-live.log:       at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:106)
Feb 02 00:02:13 es-live-1 elasticsearch-live.log:       ... 3 more
Feb 02 00:02:13 es-live-1 elasticsearch-live.log:   Caused by: ElasticsearchException[failure in bulk execution:
Feb 02 00:02:13 es-live-1 elasticsearch-live.log:  [0]: index [.marvel-es-2016.02.02], type [node_stats], id [null], message [RemoteTransportException[[Corruptor][es-live-0:9300][indices:admin/create]]; nested: ProcessClusterEventTimeoutException[failed to process cluster event (acquire index lock) within 1m];]]
Feb 02 00:02:13 es-live-1 elasticsearch-live.log:       at org.elasticsearch.marvel.agent.exporter.local.LocalBulk.flush(LocalBulk.java:114)
Feb 02 00:02:13 es-live-1 elasticsearch-live.log:       at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:101)
Feb 02 00:02:13 es-live-1 elasticsearch-live.log:       ... 3 more

奇迹索引创建失败并关闭elasticsearch集群

0 个答案: