Question

我试图将数据批量插入到包含3个数据节点的4节点elasticsearch集群中。

数据节点规格： 16 CPU - 7GB RAM - 500GB SSD

将数据插入非数据节点并在5个分片上拆分并设置为1个重复。要插入大约250GB的数据。

然而，在每个节点上插入大约40GB的数据并且在整个时间跨度内有大约60％的CPU和大约30％的RAM使用率处理一小时的处理后，一些分片处于初始化状态：

~$ curl -XGET 'http://localhost:9200/_cluster/health/osm?level=shards&pretty=true'
{
  "cluster_name" : "elastic_osm",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 5,
  "active_shards" : 9,
  "relocating_shards" : 1,
  "initializing_shards" : 1,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "indices" : {
    "osm" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 9,
      "relocating_shards" : 1,
      "initializing_shards" : 1,
      "unassigned_shards" : 0,
      "shards" : {
        "0" : {
          "status" : "yellow",
          "primary_active" : true,
          "active_shards" : 1,
          "relocating_shards" : 0,
          "initializing_shards" : 1,
          "unassigned_shards" : 0
        },
        "1" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 2,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        },
        "2" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 2,
          "relocating_shards" : 1,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        },
        "3" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 2,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        },
        "4" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 2,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        }
      }
    }
  }
}

深入挖掘，我发现一个节点的堆空间有问题：

~$ curl -XGET 'localhost:9200/osm/_search_shards?pretty=true'
{
  "nodes" : {
    "1DpvDUf7SKywJrBgQqs9eg" : {
      "name" : "elastic-osm-node-1",
      "transport_address" : "inet[/xxx.xxx.x.x:xxxx]",
      "attributes" : {
        "master" : "true"
      }
    },
    "FiBYw-v_QfO3nJQfHflf_w" : {
      "name" : "elastic-osm-node-3",
      "transport_address" : "inet[/xxx.xxx.x.x:x]",
      "attributes" : {
        "master" : "true"
      }
    },
    "ibpt8lGiS6yDJf4e09RN9Q" : {
      "name" : "elastic-osm-node-2",
      "transport_address" : "inet[/xxx.xxx.x.x:xxxx]",
      "attributes" : {
        "master" : "true"
      }
    }
  },
  "shards" : [ [ {
    "state" : "STARTED",
    "primary" : true,
    "node" : "ibpt8lGiS6yDJf4e09RN9Q",
    "relocating_node" : null,
    "shard" : 0,
    "index" : "osm"
  }, {
    "state" : "INITIALIZING",
    "primary" : false,
    "node" : "FiBYw-v_QfO3nJQfHflf_w",
    "relocating_node" : null,
    "shard" : 0,
    "index" : "osm",
    "unassigned_info" : {
      "reason" : "ALLOCATION_FAILED",
      "at" : "2015-10-30T10:42:25.539Z",
      "details" : "shard failure [engine failure, reason [already closed by tragic event]][OutOfMemoryError[Java heap space]]"
    }
  } ], [ {
    "state" : "STARTED",
    "primary" : true,
    "node" : "FiBYw-v_QfO3nJQfHflf_w",
    "relocating_node" : null,
    "shard" : 1,
    "index" : "osm"
  }, {
    "state" : "STARTED",
    "primary" : false,
    "node" : "1DpvDUf7SKywJrBgQqs9eg",
    "relocating_node" : null,
    "shard" : 1,
    "index" : "osm"
  } ], [ {
    "state" : "RELOCATING",
    "primary" : false,
    "node" : "FiBYw-v_QfO3nJQfHflf_w",
    "relocating_node" : "1DpvDUf7SKywJrBgQqs9eg",
    "shard" : 2,
    "index" : "osm"
  }, {
    "state" : "STARTED",
    "primary" : true,
    "node" : "ibpt8lGiS6yDJf4e09RN9Q",
    "relocating_node" : null,
    "shard" : 2,
    "index" : "osm"
  }, {
    "state" : "INITIALIZING",
    "primary" : false,
    "node" : "1DpvDUf7SKywJrBgQqs9eg",
    "relocating_node" : "FiBYw-v_QfO3nJQfHflf_w",
    "shard" : 2,
    "index" : "osm"
  } ], [ {
    "state" : "STARTED",
    "primary" : false,
    "node" : "FiBYw-v_QfO3nJQfHflf_w",
    "relocating_node" : null,
    "shard" : 3,
    "index" : "osm"
  }, {
    "state" : "STARTED",
    "primary" : true,
    "node" : "1DpvDUf7SKywJrBgQqs9eg",
    "relocating_node" : null,
    "shard" : 3,
    "index" : "osm"
  } ], [ {
    "state" : "STARTED",
    "primary" : false,
    "node" : "ibpt8lGiS6yDJf4e09RN9Q",
    "relocating_node" : null,
    "shard" : 4,
    "index" : "osm"
  }, {
    "state" : "STARTED",
    "primary" : true,
    "node" : "FiBYw-v_QfO3nJQfHflf_w",
    "relocating_node" : null,
    "shard" : 4,
    "index" : "osm"
  } ] ]
}

然而，服务器上设置的ES_HEAP_SIZE是ram的一半：

~$ echo $ES_HEAP_SIZE
7233.0m

，使用量仅为5g：

~$ free -g
             total       used
Mem:            14          5

如果我再等一下，节点完全离开集群，所有副本都进入初始化状态，这使我的插入失败并停止：

{
    "state" : "INITIALIZING",
    "primary" : false,
    "node" : "ibpt8lGiS6yDJf4e09RN9Q",
    "relocating_node" : null,
    "shard" : 3,
    "index" : "osm",
    "unassigned_info" : {
      "reason" : "NODE_LEFT",
      "at" : "2015-10-30T10:53:32.044Z",
      "details" : "node_left[FiBYw-v_QfO3nJQfHflf_w]"
    }

Conf：为了加速插入，我在数据节点elasticsearch配置

上使用这些参数

refresh_interval：-1， threadpool.bulk.size：16， threadpool.bulk.queue_size：1000

为什么会这样？我该如何解决这个问题并使我的批量插入成功？对于最大堆大小，我是否需要超过50％的RAM？

编辑：由于调整弹性搜索参数并不好，我删除了线程池参数，但它工作得非常慢。 Elasticsearch并非旨在过快地摄取太多数据。

Answer 1

删除这些设置：

threadpool.bulk.size: 16
threadpool.bulk.queue_size: 1000

这些设置的默认值应该足够好，不会使群集过载。

并确保按照说明here正确调整批量索引编制过程的大小。根据群集/数据，批量需要具有一定的大小。对于那些希望尽可能多地摄取的人，你不能使用你想要的任何值。每个群集都有局限性，你应该测试你的。

为什么在批量插入期间碎片会被初始化和重新定位

1 个答案: