Question

我的群集处于黄色状态，因为某些分片未分配。该怎么办？

我尝试将cluster.routing.allocation.disable_allocation = false设置为所有索引，但我认为这不起作用，因为我使用的是版本1.1.1。

我也尝试重启所有机器，但同样的情况发生了。

有什么想法吗？

编辑：

群集统计信息：

{ 
  cluster_name: "elasticsearch",
  status: "red",
  timed_out: false,
  number_of_nodes: 5,
  number_of_data_nodes: 4,
  active_primary_shards: 4689,
  active_shards: 4689,
  relocating_shards: 0,
  initializing_shards: 10,
  unassigned_shards: 758
}

Answer 1

分配不会发生的原因有很多：

您在不同节点上运行不同版本的Elasticsearch
群集中只有一个节点，但您将副本数设置为零以外的其他节点。
您的磁盘空间不足。
您已禁用分片分配。
您启用了防火墙或SELinux。启用SELinux但未正确配置后，您将看到分片永久停留在INITIALIZING或RELOCATING中。

作为一般规则，您可以解决以下问题：

查看群集中的节点：curl -s 'localhost:9200/_cat/nodes?v'。如果您只有一个节点，则需要将number_of_replicas设置为0.（请参阅ES文档或其他答案）。
查看群集中可用的磁盘空间：curl -s 'localhost:9200/_cat/allocation?v'
检查群集设置：curl 'http://localhost:9200/_cluster/settings?pretty'并查找cluster.routing设置
查看哪些分片是UNASSIGNED curl -s localhost:9200/_cat/shards?v | grep UNASS

尝试强制分配分片

curl -XPOST -d '{ "commands" : [ {
  "allocate" : {
       "index" : ".marvel-2014.05.21", 
       "shard" : 0, 
       "node" : "SOME_NODE_HERE",
       "allow_primary":true 
     } 
  } ] }' http://localhost:9200/_cluster/reroute?pretty

查看回复并查看其内容。会有一堆YES是好的，然后是NO。如果没有NO，则可能是防火墙/ SELinux问题。

Answer 2

这是默认索引设置引起的常见问题，尤其是当您尝试在单个节点上进行复制时。要通过瞬态群集设置解决此问题，请执行以下操作：

curl -XPUT http://localhost:9200/_settings -d '{ "number_of_replicas" :0 }'

接下来，启用群集重新分配分片（您可以在完成所有操作后始终启用此功能）：

curl -XPUT http://localhost:9200/_cluster/settings -d '
{
    "transient" : {
        "cluster.routing.allocation.enable": true
    }
}'

现在请坐下来观看群集清理未分配的副本分片。如果您希望这对未来的索引生效，请不要忘记使用以下设置修改elasticsearch.yml文件并退回群集：

index.number_of_replicas: 0

Answer 3

那些未分配的分片实际上是来自主节点的实际分片的未分配副本。

为了分配这些分片，您需要运行一个新的elasticsearch实例来创建一个辅助节点来携带数据副本。

编辑：有时，未分配的分片属于已删除的索引，这使得它们的孤立分片无论是否添加节点都不会分配。但事实并非如此！

Answer 4

唯一对我有用的是更改number_of_replicas（我有2个副本，所以我将其更改为1，然后再更改为2）。

首先：

PUT /myindex/_settings
{
    "index" : {
        "number_of_replicas" : 1
     }
}

然后：

PUT /myindex/_settings
{
    "index" : {
        "number_of_replicas" : 2
     }
}

Answer 5

Alcanzar的答案的前2点为我做了，但我必须添加

"allow_primary" : true

喜欢这样

curl -XPOST http://localhost:9200/_cluster/reroute?pretty -d '{
  "commands": [
    {
      "allocate": {
        "index": ".marvel-2014.05.21",
        "shard": 0,
        "node": "SOME_NODE_HERE",
        "allow_primary": true
      }
    }
  ]
}'

Answer 6

使用更新的ES版本，可以解决问题（在Kibana DevTools中运行）：

PUT /_cluster/settings
{
  "transient" : {
    "cluster.routing.rebalance.enable" : "all"
  }
}

但是，这不能解决根本原因。在我的情况下，有很多未分配的分片，因为默认副本大小为1，但实际上我仅使用单个节点。因此，我还在此行中添加了elasticsearch.yml：

index.number_of_replicas: 0

Answer 7

检查每个节点上的ElasticSearch版本是否相同。如果不是，则ES不会将索引的副本分配给“较旧”的节点。

使用@ Alcanzar的答案，您可以收到一些诊断错误消息：

curl -XPOST 'http://localhost:9200/_cluster/reroute?pretty' -d '{
  "commands": [
    {
      "allocate": {
        "index": "logstash-2016.01.31",
        "shard": 1,
        "node": "arc-elk-es3",
        "allow_primary": true
      }
    }
  ]
}'

结果是：

{
  "error" : "ElasticsearchIllegalArgumentException[[allocate] allocation of
            [logstash-2016.01.31][1] on node [arc-elk-es3]
            [Xn8HF16OTxmnQxzRzMzrlA][arc-elk-es3][inet[/172.16.102.48:9300]]{master=false} is not allowed, reason:
            [YES(shard is not allocated to same node or host)]
            [YES(node passes include/exclude/require filters)]
            [YES(primary is already active)]
            [YES(below shard recovery limit of [2])]
            [YES(allocation disabling is ignored)]
            [YES(allocation disabling is ignored)]
            [YES(no allocation awareness enabled)]
            [YES(total shard limit disabled: [-1] <= 0)]
            *** [NO(target node version [1.7.4] is older than source node version [1.7.5]) ***
            [YES(enough disk for shard on node, free: [185.3gb])]
            [YES(shard not primary or relocation disabled)]]",
  "status" : 400
}

如何确定ElasticSearch的版本号：

adminuser@arc-elk-web:/var/log/kibana$ curl -XGET 'localhost:9200'
{
  "status" : 200,
  "name" : "arc-elk-web",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.7.5",
    "build_hash" : "00f95f4ffca6de89d68b7ccaf80d148f1f70e4d4",
    "build_timestamp" : "2016-02-02T09:55:30Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

在我的情况下，我错误地设置了apt-get存储库，并且它们在不同的服务器上不同步。我在所有服务器上更正了它：

echo "deb http://packages.elastic.co/elasticsearch/1.7/debian stable main" | sudo tee -a /etc/apt/sources.list

然后通常：

sudo apt-get update
sudo apt-get upgrade

并最终重启服务器。

elasticsearch - 如何处理未分配的分片

7 个答案: