Question

我认为我什至从未触及过该索引，但是这使我的整个群集处于红色状态。不知道它是什么或如何修复，尝试添加另一个节点，但是没有用。在索引管理视图中，我可以看到它是唯一的红色索引。问题索引为opendistro-ism-config。我尝试更改索引的副本数，添加节点等无济于事。

修改

按照@Val的要求，我添加了以下查询。我的索引保持红色状态，垃圾邮件会在我已部署集群的AWS上向我发出警报。我分配了索引，因此我将它们从shard_sizes 的输出中删除了，只剩下一个问题了。我有4 x t2.small 35个GiB SSD，集群中有足够的备用空间。这不是我的产品集群，所以还不错，但是很烦人。

https://{{ES_DOMAIN}}/_cluster/allocation/explain?include_disk_info&include_yes_decisions

{
    "index": ".opendistro-ism-config",
    "shard": 1,
    "primary": true,
    "current_state": "unassigned",
    "unassigned_info": {
        "reason": "ALLOCATION_FAILED",
        "at": "2020-08-01T09:18:40.288Z",
        "failed_allocation_attempts": 5,
        "details": "failed shard on node [ex3PL3THRHmAxkvMjOwrQQ]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[.opendistro-ism-config][1]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ",
        "last_allocation_status": "no_valid_shard_copy"
    },
    "cluster_info": {
        "nodes": {
            "KnCBTiL1TZCGz1DNYfm9_A": {
                "node_name": "ef9116cc46563e2c73d12eb7a8887f4c",
                "least_available": {
                    "total_bytes": 36722737152,
                    "used_bytes": 2143232000,
                    "free_bytes": 34579505152,
                    "free_disk_percent": 94.2,
                    "used_disk_percent": 5.8
                },
                "most_available": {
                    "total_bytes": 36722737152,
                    "used_bytes": 2143232000,
                    "free_bytes": 34579505152,
                    "free_disk_percent": 94.2,
                    "used_disk_percent": 5.8
                }
            },
            "90rKZw_SSOSlOGWv_WyQQQ": {
                "node_name": "45cfd2c275112972c5e68e7e00295d45",
                "least_available": {
                    "total_bytes": 36722737152,
                    "used_bytes": 2144980992,
                    "free_bytes": 34577756160,
                    "free_disk_percent": 94.2,
                    "used_disk_percent": 5.8
                },
                "most_available": {
                    "total_bytes": 36722737152,
                    "used_bytes": 2144980992,
                    "free_bytes": 34577756160,
                    "free_disk_percent": 94.2,
                    "used_disk_percent": 5.8
                }
            },
            "2F_QTYueTs69Q7KhCped9w": {
                "node_name": "a8314d5f13c0043f8454997d973e8c03",
                "least_available": {
                    "total_bytes": 36722737152,
                    "used_bytes": 1957380096,
                    "free_bytes": 34765357056,
                    "free_disk_percent": 94.7,
                    "used_disk_percent": 5.3
                },
                "most_available": {
                    "total_bytes": 36722737152,
                    "used_bytes": 1957380096,
                    "free_bytes": 34765357056,
                    "free_disk_percent": 94.7,
                    "used_disk_percent": 5.3
                }
            },
            "8-oMtA69QvO3bKTAAUPeBw": {
                "node_name": "9c042bb3814270c16b4fba03ff85208d",
                "least_available": {
                    "total_bytes": 36722737152,
                    "used_bytes": 2140692480,
                    "free_bytes": 34582044672,
                    "free_disk_percent": 94.2,
                    "used_disk_percent": 5.8
                },
                "most_available": {
                    "total_bytes": 36722737152,
                    "used_bytes": 2140692480,
                    "free_bytes": 34582044672,
                    "free_disk_percent": 94.2,
                    "used_disk_percent": 5.8
                }
            }
        },
        "shard_sizes": {
            "[.opendistro-ism-config][2][r]_bytes": 56497,
            "[.opendistro-ism-config][0][p]_bytes": 53651,
            "[.opendistro-ism-config][0][r]_bytes": 53651,
            "[.opendistro-ism-config][4][p]_bytes": 33157,
            "[.opendistro-ism-config][2][p]_bytes": 56497
            }
        },
        "can_allocate": "no_valid_shard_copy",
        "allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
        "node_allocation_decisions": [
            {
                "node_id": "2F_QTYueTs69Q7KhCped9w",
                "node_name": "a8314d5f13c0043f8454997d973e8c03",
                "node_decision": "no",
                "store": {
                    "found": false
                }
            },
            {
                "node_id": "8-oMtA69QvO3bKTAAUPeBw",
                "node_name": "9c042bb3814270c16b4fba03ff85208d",
                "node_decision": "no",
                "store": {
                    "found": false
                }
            },
            {
                "node_id": "90rKZw_SSOSlOGWv_WyQQQ",
                "node_name": "45cfd2c275112972c5e68e7e00295d45",
                "node_decision": "no",
                "store": {
                    "found": false
                }
            },
            {
                "node_id": "KnCBTiL1TZCGz1DNYfm9_A",
                "node_name": "ef9116cc46563e2c73d12eb7a8887f4c",
                "node_decision": "no",
                "store": {
                    "found": false
                }
            }
        ]
    }

Answer 1

RED集群状态表示缺少一个或多个主索引，并且该索引可能没有该主分片的任何副本，或者es无法将副本提升为主分片。

请按照official ES blog进行问题排查。

如果您还没有丢失主副本的副本副本，添加另一个节点也无济于事。

Answer 2

一种使群集再次工作的解决方法是手动重新路由分片。

问题原因：如果主节点断开了与主节点的连接，而没有副本分配给该主节点，则通常会发生这种情况。因此，当重新加入集群时，在主节点已经进行5次尝试以失败的方式将节点重新分配给节点时，节点上本地分配的碎片副本无法释放以前使用的资源。

在尝试5次分配失败后，主机放弃并需要手动触发以再次分配。

解决方案：运行以下命令以解决该问题：

curl -XPOST 'localhost:9200/_cluster/reroute?retry_failed

为什么此索引状态为红色：opendistro-ism-config

2 个答案: