关闭/打开索引后的弹性搜索磁盘空间问题

时间:2016-04-01 22:00:05

标签: elasticsearch diskspace

我最近需要打开&关闭Elasticsearch索引以添加自定义分析器&创建一个映射。从那时起,我一直在所有节点上看到磁盘空间问题,并且不确定最佳方法。

以下是当天主节点上日志文件中的一些选定行:

[2016-03-18 01:54:46,161][INFO ][cluster.metadata] [instance name] closing indices [[prod]]
[2016-03-18 01:54:46,161][INFO ][cluster.metadata] [instance name] opening indices [[prod]]
[2016-03-18 01:54:48,493][WARN ][cluster.routing.allocation.decider] [instance name] After allocating, node [nodename] would have less than the required 0b free bytes threshold (-28916726190 bytes free), preventing allocation
[2016-03-18 01:54:48,494][WARN ][cluster.routing.allocation.decider] [instance name] After allocating, node [nodename] would have less than the required 0b free bytes threshold  (-29217364398 bytes free), preventing allocation

...每个ES节点的其中一行......

然后,每个节点看起来像其中一个:

[2016-03-18 01:54:49,500][DEBUG][action.search.type] [instance name] All shards failed for phase: [query]
org.elasticsearch.transport.RemoteTransportException: instance name]][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.index.shard.IllegalIndexShardStateException: [prod][0] CurrentState[RECOVERING] operations only allowed when started/relocated
    at org.elasticsearch.index.shard.IndexShard.readAllowed(IndexShard.java:1000)
    at org.elasticsearch.index.shard.IndexShard.acquireSearcher(IndexShard.java:793)
    at org.elasticsearch.index.shard.IndexShard.acquireSearcher(IndexShard.java:789)
    at org.elasticsearch.search.SearchService.createContext(SearchService.java:552)
    at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:532)
    at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:294)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:776)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:767)
    at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

然后,更多的第一个块引用行,然后:

[2016-03-18 01:56:39,891][INFO ][cluster.metadata] [instance name] [prod] create_mapping [mapping name]

然后其中两行:

[2016-03-18 02:05:18,993][WARN ][cluster.action.shard] [instance name] [prod][1] received shard failed for [prod][1], node[node name], [R], s[INITIALIZING], indexUUID [index id], reason [shard failure [failed recovery][RecoveryFailedException[[prod][1]: Recovery failed from [instance name]{master=true} into [instance name]{master=false}]; nested: RemoteTransportException[[instance name][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[prod][1] Phase[1] Execution failed]; nested: RecoverFilesRecoveryException[[prod][1] Failed to transfer [0] files with total size of [0b]]; nested: IllegalStateException[try to recover [prod][1] from primary shard with sync id but number of docs differ: 217250828 (instance name, primary) vs 217250830(instance name)]; ]]

然后,一堆低磁盘然后高磁盘水印错误。自打开/关闭命令运行以来,这些错误一直在发生,因此它们阻止了新数据的索引。

当我运行/ cat / _shards / prod时,我看到:

index shard prirep state             docs  store ip          node                                    
prod  0     p      STARTED      218452373 73.5gb 
prod  0     r      STARTED      218452373 73.5gb 
prod  0     r      UNASSIGNED                                                                        
prod  1     p      STARTED      217445482 73.1gb 
prod  1     r      STARTED      217445482 73.1gb 
prod  1     r      UNASSIGNED                                                                        
prod  2     r      INITIALIZING                  
prod  2     r      INITIALIZING                 
prod  2     p      STARTED      218665090 73.2gb

并注意到碎片2的一个副本碎片在INITIALIZING和UNASSIGNED阶段之间振荡。

我真的希望有人能就最佳前进方向进行咨询,因为这个问题每天都会变得更加痛苦。我现在能想到的最好的方法是备份所有数据,更新索引设置以获得0个副本(摆脱未分配的分片),然后更新以添加1个副本(因为我感觉可能是恢复过程无意中添加了副本)。我无法弄清楚如何确认这个理论,我能辨别的最多的是我们没有覆盖elasticsearch.yml中的默认设置(默认值是1个副本),而且我们的ec2实例大小似乎不大能够在同一个实例上保存2个分片。我真的想知道是否有人对如何以及为何开启/关闭和索引导致磁盘使用率激增有任何想法。 ElasticSearch文档提到关闭索引可能导致这种情况,但他们没有提供太多其他背景知识。

如果有帮助,请尽快提供任何其他信息,谢谢(这么多)提前!

0 个答案:

没有答案