我认为我什至从未触及过该索引,但是这使我的整个群集处于红色状态。不知道它是什么或如何修复,尝试添加另一个节点,但是没有用。
在索引管理视图中,我可以看到它是唯一的红色索引。问题索引为opendistro-ism-config
。我尝试更改索引的副本数,添加节点等无济于事。
修改
按照@Val的要求,我添加了以下查询。我的索引保持红色状态,垃圾邮件会在我已部署集群的AWS上向我发出警报。我分配了索引,因此我将它们从shard_sizes
的输出中删除了,只剩下一个问题了。我有4 x t2.small
35个GiB SSD,集群中有足够的备用空间。这不是我的产品集群,所以还不错,但是很烦人。
https://{{ES_DOMAIN}}/_cluster/allocation/explain?include_disk_info&include_yes_decisions
{
"index": ".opendistro-ism-config",
"shard": 1,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2020-08-01T09:18:40.288Z",
"failed_allocation_attempts": 5,
"details": "failed shard on node [ex3PL3THRHmAxkvMjOwrQQ]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[.opendistro-ism-config][1]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ",
"last_allocation_status": "no_valid_shard_copy"
},
"cluster_info": {
"nodes": {
"KnCBTiL1TZCGz1DNYfm9_A": {
"node_name": "ef9116cc46563e2c73d12eb7a8887f4c",
"least_available": {
"total_bytes": 36722737152,
"used_bytes": 2143232000,
"free_bytes": 34579505152,
"free_disk_percent": 94.2,
"used_disk_percent": 5.8
},
"most_available": {
"total_bytes": 36722737152,
"used_bytes": 2143232000,
"free_bytes": 34579505152,
"free_disk_percent": 94.2,
"used_disk_percent": 5.8
}
},
"90rKZw_SSOSlOGWv_WyQQQ": {
"node_name": "45cfd2c275112972c5e68e7e00295d45",
"least_available": {
"total_bytes": 36722737152,
"used_bytes": 2144980992,
"free_bytes": 34577756160,
"free_disk_percent": 94.2,
"used_disk_percent": 5.8
},
"most_available": {
"total_bytes": 36722737152,
"used_bytes": 2144980992,
"free_bytes": 34577756160,
"free_disk_percent": 94.2,
"used_disk_percent": 5.8
}
},
"2F_QTYueTs69Q7KhCped9w": {
"node_name": "a8314d5f13c0043f8454997d973e8c03",
"least_available": {
"total_bytes": 36722737152,
"used_bytes": 1957380096,
"free_bytes": 34765357056,
"free_disk_percent": 94.7,
"used_disk_percent": 5.3
},
"most_available": {
"total_bytes": 36722737152,
"used_bytes": 1957380096,
"free_bytes": 34765357056,
"free_disk_percent": 94.7,
"used_disk_percent": 5.3
}
},
"8-oMtA69QvO3bKTAAUPeBw": {
"node_name": "9c042bb3814270c16b4fba03ff85208d",
"least_available": {
"total_bytes": 36722737152,
"used_bytes": 2140692480,
"free_bytes": 34582044672,
"free_disk_percent": 94.2,
"used_disk_percent": 5.8
},
"most_available": {
"total_bytes": 36722737152,
"used_bytes": 2140692480,
"free_bytes": 34582044672,
"free_disk_percent": 94.2,
"used_disk_percent": 5.8
}
}
},
"shard_sizes": {
"[.opendistro-ism-config][2][r]_bytes": 56497,
"[.opendistro-ism-config][0][p]_bytes": 53651,
"[.opendistro-ism-config][0][r]_bytes": 53651,
"[.opendistro-ism-config][4][p]_bytes": 33157,
"[.opendistro-ism-config][2][p]_bytes": 56497
}
},
"can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
"node_allocation_decisions": [
{
"node_id": "2F_QTYueTs69Q7KhCped9w",
"node_name": "a8314d5f13c0043f8454997d973e8c03",
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "8-oMtA69QvO3bKTAAUPeBw",
"node_name": "9c042bb3814270c16b4fba03ff85208d",
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "90rKZw_SSOSlOGWv_WyQQQ",
"node_name": "45cfd2c275112972c5e68e7e00295d45",
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "KnCBTiL1TZCGz1DNYfm9_A",
"node_name": "ef9116cc46563e2c73d12eb7a8887f4c",
"node_decision": "no",
"store": {
"found": false
}
}
]
}
答案 0 :(得分:0)
RED集群状态表示缺少一个或多个主索引,并且该索引可能没有该主分片的任何副本,或者es无法将副本提升为主分片。
请按照official ES blog进行问题排查。
如果您还没有丢失主副本的副本副本,添加另一个节点也无济于事。
答案 1 :(得分:0)
一种使群集再次工作的解决方法是手动重新路由 分片。
问题原因:如果主节点断开了与主节点的连接,而没有副本分配给该主节点,则通常会发生这种情况。因此,当重新加入集群时,在主节点已经进行5次尝试以失败的方式将节点重新分配给节点时,节点上本地分配的碎片副本无法释放以前使用的资源。
在尝试5次分配失败后,主机放弃并需要手动触发以再次分配。
解决方案:运行以下命令以解决该问题:
curl -XPOST 'localhost:9200/_cluster/reroute?retry_failed