Question

我的集群中一切正常，今天我发现我没有来自packetbeat的日志，并且碎片的健康状况是红色的：

当我运行GET _cat / shards时，我会得到类似的东西：

packetbeat-7.9.3-2020.10.28-000001                         2 p STARTED      11428    3.8mb 10.13.81.12 VSELK-MASTER-02
packetbeat-7.9.3-2020.10.28-000001                         2 r STARTED      11428    3.8mb 10.13.81.13 VSELK-MASTER-03
packetbeat-7.9.3-2020.10.28-000001                         9 r STARTED      11402    3.8mb 10.13.81.12 VSELK-MASTER-02
packetbeat-7.9.3-2020.10.28-000001                         9 p STARTED      11402    3.8mb 10.13.81.21 VSELK-DATA-01
packetbeat-7.9.3-2020.10.28-000001                         4 p STARTED      11619      4mb 10.13.81.21 VSELK-DATA-01
packetbeat-7.9.3-2020.10.28-000001                         4 r STARTED      11619    3.9mb 10.13.81.22 VSELK-DATA-02
packetbeat-7.9.3-2020.10.28-000001                         5 r STARTED      11567    3.8mb 10.13.81.21 VSELK-DATA-01
packetbeat-7.9.3-2020.10.28-000001                         5 p STARTED      11567    3.9mb 10.13.81.22 VSELK-DATA-02
packetbeat-7.9.3-2020.10.28-000001                         1 r STARTED      11553    3.8mb 10.13.81.11 VSELK-MASTER-01
packetbeat-7.9.3-2020.10.28-000001                         1 p STARTED      11553    3.9mb 10.13.81.22 VSELK-DATA-02
packetbeat-7.9.3-2020.10.28-000001                         7 r UNASSIGNED                              
packetbeat-7.9.3-2020.10.28-000001                         7 p UNASSIGNED                              
packetbeat-7.9.3-2020.10.28-000001                         6 r UNASSIGNED                              
packetbeat-7.9.3-2020.10.28-000001                         6 p UNASSIGNED                              
packetbeat-7.9.3-2020.10.28-000001                         8 r STARTED      11630      4mb 10.13.81.12 VSELK-MASTER-02
packetbeat-7.9.3-2020.10.28-000001                         8 p STARTED      11630    3.9mb 10.13.81.21 VSELK-DATA-01
packetbeat-7.9.3-2020.10.28-000001                         3 p STARTED      11495      4mb 10.13.81.12 VSELK-MASTER-02
packetbeat-7.9.3-2020.10.28-000001                         3 r STARTED      11495    3.7mb 10.13.81.13 VSELK-MASTER-03
packetbeat-7.9.3-2020.10.28-000001                         0 r STARTED      11713      4mb 10.13.81.11 VSELK-MASTER-01
packetbeat-7.9.3-2020.10.28-000001                         0 p STARTED      11713      4mb 10.13.81.22 VSELK-DATA-02

当我运行时，我得到：GET / _cluster / allocation / explain

{
  "index" : "packetbeat-7.9.2-2020.10.22-000001",
  "shard" : 6,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2020-10-28T13:22:03.006Z",
    "failed_allocation_attempts" : 5,
    "details" : """failed shard on node [RCeMt0uXQie_ax_Sp22hLw]: failed to create shard, failure java.io.IOException: failed to obtain in-memory shard lock
    at org.elasticsearch.index.IndexService.createShard(IndexService.java:489)
    at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:763)
    at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:176)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:607)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:584)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:242)
    at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:504)
    at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:494)
    at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:471)
    at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:418)
    at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:162)
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:674)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
    at java.lang.Thread.run(Thread.java:832)
Caused by: [packetbeat-7.9.2-2020.10.22-000001/RRAnRZrrRZiihscJ3bymig][[packetbeat-7.9.2-2020.10.22-000001][6]] org.elasticsearch.env.ShardLockObtainFailedException: [packetbeat-7.9.2-2020.10.22-000001][6]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [199852ms]
    at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:869)
    at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:775)
    at org.elasticsearch.index.IndexService.createShard(IndexService.java:409)
    ... 16 more
""",
    "last_allocation_status" : "no"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy",
  "node_allocation_decisions" : [
    {
      "node_id" : "A_nOoYrdSSOAHNQrhfveNA",
      "node_name" : "VSELK-DATA-02",
      "transport_address" : "10.13.81.22:9300",
      "node_attributes" : {
        "ml.machine_memory" : "8365424640",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "data" : "cold",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    },
    {
      "node_id" : "RCeMt0uXQie_ax_Sp22hLw",
      "node_name" : "VSELK-MASTER-03",
      "transport_address" : "10.13.81.13:9300",
      "node_attributes" : {
        "ml.machine_memory" : "8365068288",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "data" : "hot",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "in_sync" : true,
        "allocation_id" : "nMvn4c4vQp2efQQtIeKzlg"
      },
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : """shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-28T13:22:03.006Z], failed_attempts[5], failed_nodes[[hHHRtd5HTCKJgLTBtgDbOw, RCeMt0uXQie_ax_Sp22hLw]], delayed=false, details[failed shard on node [RCeMt0uXQie_ax_Sp22hLw]: failed to create shard, failure java.io.IOException: failed to obtain in-memory shard lock
    at org.elasticsearch.index.IndexService.createShard(IndexService.java:489)
    at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:763)
    at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:176)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:607)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:584)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:242)
    at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:504)
    at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:494)
    at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:471)
    at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:418)
    at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:162)
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:674)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
    at java.lang.Thread.run(Thread.java:832)
Caused by: [packetbeat-7.9.2-2020.10.22-000001/RRAnRZrrRZiihscJ3bymig][[packetbeat-7.9.2-2020.10.22-000001][6]] org.elasticsearch.env.ShardLockObtainFailedException: [packetbeat-7.9.2-2020.10.22-000001][6]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [199852ms]
    at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:869)
    at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:775)
    at org.elasticsearch.index.IndexService.createShard(IndexService.java:409)
    ... 16 more
], allocation_status[deciders_no]]]"""
        }
      ]
    },
    {
      "node_id" : "hHHRtd5HTCKJgLTBtgDbOw",
      "node_name" : "VSELK-MASTER-01",
      "transport_address" : "10.13.81.11:9300",
      "node_attributes" : {
        "ml.machine_memory" : "8365068288",
        "xpack.installed" : "true",
        "data" : "hot",
        "transform.node" : "true",
        "ml.max_open_jobs" : "20"
      },
      "node_decision" : "no",
      "store" : {
        "in_sync" : true,
        "allocation_id" : "ByqJGtQSQT-p8dCCfk3VlA"
      },
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : """shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-28T13:22:03.006Z], failed_attempts[5], failed_nodes[[hHHRtd5HTCKJgLTBtgDbOw, RCeMt0uXQie_ax_Sp22hLw]], delayed=false, details[failed shard on node [RCeMt0uXQie_ax_Sp22hLw]: failed to create shard, failure java.io.IOException: failed to obtain in-memory shard lock
    at org.elasticsearch.index.IndexService.createShard(IndexService.java:489)
    at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:763)
    at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:176)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:607)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:584)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:242)
    at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:504)
    at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:494)
    at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:471)
    at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:418)
    at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:162)
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:674)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
    at java.lang.Thread.run(Thread.java:832)
Caused by: [packetbeat-7.9.2-2020.10.22-000001/RRAnRZrrRZiihscJ3bymig][[packetbeat-7.9.2-2020.10.22-000001][6]] org.elasticsearch.env.ShardLockObtainFailedException: [packetbeat-7.9.2-2020.10.22-000001][6]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [199852ms]
    at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:869)
    at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:775)
    at org.elasticsearch.index.IndexService.createShard(IndexService.java:409)
    ... 16 more
], allocation_status[deciders_no]]]"""
        }
      ]
    },
    {
      "node_id" : "k_SgmMDMRfGi-IFLbI-cRw",
      "node_name" : "VSELK-MASTER-02",
      "transport_address" : "10.13.81.12:9300",
      "node_attributes" : {
        "ml.machine_memory" : "8365056000",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "data" : "hot",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    },
    {
      "node_id" : "r4V_KqZDQ7mYi7AZea5eXQ",
      "node_name" : "VSELK-DATA-01",
      "transport_address" : "10.13.81.21:9300",
      "node_attributes" : {
        "ml.machine_memory" : "8365424640",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "data" : "warm",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    }
  ]
}

有人可以告诉我这种错误的原因以及如何解决这些错误吗？（知道我的集群中有5个节点，3个主节点和2个数据节点，并且它们都已启动）

感谢您的帮助！

Answer 1

您可以按照相关的GitHub issue，特别是this的注释来解决此问题。

简而言之，您应该尝试使用以下更安全的命令

curl -XPOST'localhost：9200 / _cluster / reroute？retry_failed

Elasticsearch索引红色状态

1 个答案: