小ES群集停留在initializing_shards

时间:2017-05-09 14:10:07

标签: elasticsearch elasticsearch-2.0

我有一个2.4.4集群,其中一个服务器/节点(esnode1)只包含一个带有1个分片和零个副本的220GB索引。

索引响应正常,但每当我干净地重新启动服务器(带有2cpu 4GB RAM 500GB SSD的ec2)时,群集状态会因为" initializing_shards"而停留在红色状态。 = 1,但没有CPU或磁盘使用(系统空闲且没有交换)很长一段时间。

我已将indices.recovery.max_bytes_per_sec提升至50mb,并尝试https://www.elastic.co/guide/en/elasticsearch/guide/current/_rolling_restarts.html处的说明,但未成功。

只有在为ES设置2GB堆时才会出现这种情况。但是,对于3GB堆,群集状态在重新启动后会更改为绿色秒。

我对如何调试或理解这一点感到很茫然,因为日志(下面)看起来很正常,有什么提示吗?

/ cluster / _health是

 {
      "cluster_name" : "escluster1",
      "status" : "red",
      "timed_out" : false,
      "number_of_nodes" : 1,
      "number_of_data_nodes" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 1,
      "unassigned_shards" : 0,
      "delayed_unassigned_shards" : 0,
      "number_of_pending_tasks" : 0,
      "number_of_in_flight_fetch" : 0,
      "task_max_waiting_in_queue_millis" : 0,
      "active_shards_percent_as_number" : 0.0
    }

这是重启后的日志:

[2017-05-04 15:00:37,975][INFO ][node                     ] [esnode1] version[2.4.4], pid[2761], build[fcbb46d/2017-01-03T11:33:16Z]
[2017-05-04 15:00:37,976][INFO ][node                     ] [esnode1] initializing ...
[2017-05-04 15:00:38,534][INFO ][plugins                  ] [esnode1] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
[2017-05-04 15:00:38,563][INFO ][env                      ] [esnode1] using [1] data paths, mounts [[/mnt/esdata2 (/dev/xvdh1)]], net usable_space [226.3gb], net total_space [492gb], spins? [no], types [ext4]
[2017-05-04 15:00:38,563][INFO ][env                      ] [esnode1] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-05-04 15:00:40,379][INFO ][node                     ] [esnode1] initialized
[2017-05-04 15:00:40,380][INFO ][node                     ] [esnode1] starting ...
[2017-05-04 15:00:40,501][INFO ][transport                ] [esnode1] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2017-05-04 15:00:40,506][INFO ][discovery                ] [esnode1] escluster1/sv3aHhUjSyueq5N4_w14mQ
[2017-05-04 15:00:43,565][INFO ][cluster.service          ] [esnode1] new_master {esnode1}{sv3aHhUjSyueq5N4_w14mQ}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2017-05-04 15:00:43,595][INFO ][indices.recovery         ] [esnode1] updating [indices.recovery.max_bytes_per_sec] from [40mb] to [50mb]
[2017-05-04 15:00:43,631][INFO ][http                     ] [esnode1] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2017-05-04 15:00:43,632][INFO ][node                     ] [esnode1] started
[2017-05-04 15:00:43,651][INFO ][gateway                  ] [esnode1] recovered

编辑1 :将日志级别切换为DEBUG,堆栈为2GB,群集状态保持为"红色"我可以看到每30秒重复记录以下消息:

[2017-05-10 15:58:45,985][DEBUG][index.shard              ] [esnode1] [myIndex][0] updateBufferSize: engine is closed; skipping
[2017-05-10 15:59:15,985][DEBUG][indices.memory           ] [esnode1] recalculating shard indexing buffer, total is [203.1mb] with [1] active shards, each shard set to indexing=[203.1mb], translog=[64kb]
[2017-05-10 15:59:15,990][DEBUG][index.shard              ] [esnode1] [myIndex][0] updateBufferSize: engine is closed; skipping
[2017-05-10 15:59:45,990][DEBUG][indices.memory           ] [esnode1] recalculating shard indexing buffer, total is [203.1mb] with [1] active shards, each shard set to indexing=[203.1mb], translog=[64kb]
[2017-05-10 15:59:45,997][DEBUG][index.shard              ] [esnode1] [myIndex][0] updateBufferSize: engine is closed; skipping
[2017-05-10 16:00:15,997][DEBUG][indices.memory           ] [esnode1] recalculating shard indexing buffer, total is [203.1mb] with [1] active shards, each shard set to indexing=[203.1mb], translog=[64kb]

编辑2 :使用3GB堆生成的输出和#34;绿色"状态:

_nodes / stats?filter_path = ** .indexs.segments:

{
  "nodes" : {
    "TeXgE1QKSMOE1xYS-miJug" : {
      "indices" : {
        "segments" : {
          "count" : 73,
          "memory_in_bytes" : 2272548617,
          "terms_memory_in_bytes" : 2269433701,
          "stored_fields_memory_in_bytes" : 3103096,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 4672,
          "doc_values_memory_in_bytes" : 7148,
          "index_writer_memory_in_bytes" : 0,
          "index_writer_max_memory_in_bytes" : 320379289,
          "version_map_memory_in_bytes" : 0,
          "fixed_bit_set_memory_in_bytes" : 0
        }
      }
    }
  }

/ _节点/统计/ JVM?filter_path = **。heap_used_in_bytes

{
  "cluster_name" : "escluster1",
  "nodes" : {
    "TeXgE1QKSMOE1xYS-miJug" : {
      "timestamp" : 1494501231058,
      "name" : "esnode1",
      "transport_address" : "127.0.0.1:9300",
      "host" : "127.0.0.1",
      "ip" : [ "127.0.0.1:9300", "NONE" ],
      "indices" : {
        "docs" : {
          "count" : 5352169,
          "deleted" : 0
        },
        "store" : {
          "size_in_bytes" : 234847391460,
          "throttle_time_in_millis" : 0
        },
        "indexing" : {
          "index_total" : 0,
          "index_time_in_millis" : 0,
          "index_current" : 0,
          "index_failed" : 0,
          "delete_total" : 0,
          "delete_time_in_millis" : 0,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0
        },
        "get" : {
          "total" : 0,
          "time_in_millis" : 0,
          "exists_total" : 0,
          "exists_time_in_millis" : 0,
          "missing_total" : 0,
          "missing_time_in_millis" : 0,
          "current" : 0
        },
        "search" : {
          "open_contexts" : 0,
          "query_total" : 0,
          "query_time_in_millis" : 0,
          "query_current" : 0,
          "fetch_total" : 0,
          "fetch_time_in_millis" : 0,
          "fetch_current" : 0,
          "scroll_total" : 0,
          "scroll_time_in_millis" : 0,
          "scroll_current" : 0
        },
        "merges" : {
          "current" : 0,
          "current_docs" : 0,
          "current_size_in_bytes" : 0,
          "total" : 0,
          "total_time_in_millis" : 0,
          "total_docs" : 0,
          "total_size_in_bytes" : 0,
          "total_stopped_time_in_millis" : 0,
          "total_throttled_time_in_millis" : 0,
          "total_auto_throttle_in_bytes" : 20971520
        },
        "refresh" : {
          "total" : 1,
          "total_time_in_millis" : 14
        },
        "flush" : {
          "total" : 1,
          "total_time_in_millis" : 10
        },
        "warmer" : {
          "current" : 0,
          "total" : 3,
          "total_time_in_millis" : 6
        },
        "query_cache" : {
          "memory_size_in_bytes" : 0,
          "total_count" : 0,
          "hit_count" : 0,
          "miss_count" : 0,
          "cache_size" : 0,
          "cache_count" : 0,
          "evictions" : 0
        },
        "fielddata" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0
        },
        "percolate" : {
          "total" : 0,
          "time_in_millis" : 0,
          "current" : 0,
          "memory_size_in_bytes" : -1,
          "memory_size" : "-1b",
          "queries" : 0
        },
        "completion" : {
          "size_in_bytes" : 0
        },
        "segments" : {
          "count" : 73,
          "memory_in_bytes" : 2272548617,
          "terms_memory_in_bytes" : 2269433701,
          "stored_fields_memory_in_bytes" : 3103096,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 4672,
          "doc_values_memory_in_bytes" : 7148,
          "index_writer_memory_in_bytes" : 0,
          "index_writer_max_memory_in_bytes" : 512000,
          "version_map_memory_in_bytes" : 0,
          "fixed_bit_set_memory_in_bytes" : 0
        },
        "translog" : {
          "operations" : 0,
          "size_in_bytes" : 43
        },
        "suggest" : {
          "total" : 0,
          "time_in_millis" : 0,
          "current" : 0
        },
        "request_cache" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0,
          "hit_count" : 0,
          "miss_count" : 0
        },
        "recovery" : {
          "current_as_source" : 0,
          "current_as_target" : 0,
          "throttle_time_in_millis" : 0
        }
      },
      "os" : {
        "timestamp" : 1494501231060,
        "cpu_percent" : 0,
        "load_average" : 0.0,
        "mem" : {
          "total_in_bytes" : 4142092288,
          "free_in_bytes" : 117051392,
          "used_in_bytes" : 4025040896,
          "free_percent" : 3,
          "used_percent" : 97
        },
        "swap" : {
          "total_in_bytes" : 0,
          "free_in_bytes" : 0,
          "used_in_bytes" : 0
        }
      },
      "process" : {
        "timestamp" : 1494501231060,
        "open_file_descriptors" : 203,
        "max_file_descriptors" : 65536,
        "cpu" : {
          "percent" : 0,
          "total_in_millis" : 14890
        },
        "mem" : {
          "total_virtual_in_bytes" : 23821713408
        }
      },
      "jvm" : {
        "timestamp" : 1494501231060,
        "uptime_in_millis" : 369041,
        "mem" : {
          "heap_used_in_bytes" : 2323777096,
          "heap_used_percent" : 72,
          "heap_committed_in_bytes" : 3203792896,
          "heap_max_in_bytes" : 3203792896,
          "non_heap_used_in_bytes" : 52525744,
          "non_heap_committed_in_bytes" : 53305344,
          "pools" : {
            "young" : {
              "used_in_bytes" : 121416432,
              "max_in_bytes" : 139591680,
              "peak_used_in_bytes" : 139591680,
              "peak_max_in_bytes" : 139591680
            },
            "survivor" : {
              "used_in_bytes" : 4653304,
              "max_in_bytes" : 17432576,
              "peak_used_in_bytes" : 17432576,
              "peak_max_in_bytes" : 17432576
            },
            "old" : {
              "used_in_bytes" : 2197707360,
              "max_in_bytes" : 3046768640,
              "peak_used_in_bytes" : 2197707360,
              "peak_max_in_bytes" : 3046768640
            }
          }
        },
        "threads" : {
          "count" : 34,
          "peak_count" : 42
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 23,
              "collection_time_in_millis" : 1027
            },
            "old" : {
              "collection_count" : 1,
              "collection_time_in_millis" : 26
            }
          }
        },
        "buffer_pools" : {
          "direct" : {
            "count" : 24,
            "used_in_bytes" : 3964472,
            "total_capacity_in_bytes" : 3964472
          },
          "mapped" : {
            "count" : 33,
            "used_in_bytes" : 18005744733,
            "total_capacity_in_bytes" : 18005744733
          }
        },
        "classes" : {
          "current_loaded_count" : 7490,
          "total_loaded_count" : 7490,
          "total_unloaded_count" : 0
        }
      },
      "thread_pool" : {
        "bulk" : {
          "threads" : 0,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 0,
          "completed" : 0
        },
        "fetch_shard_started" : {
          "threads" : 1,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 1,
          "completed" : 1
        },
        "fetch_shard_store" : {
          "threads" : 0,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 0,
          "completed" : 0
        },
        "flush" : {
          "threads" : 1,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 1,
          "completed" : 2
        },
        "force_merge" : {
          "threads" : 0,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 0,
          "completed" : 0
        },
        "generic" : {
          "threads" : 1,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 5,
          "completed" : 69
        },
        "get" : {
          "threads" : 0,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 0,
          "completed" : 0
        },
        "index" : {
          "threads" : 0,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 0,
          "completed" : 0
        },
        "listener" : {
          "threads" : 1,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 1,
          "completed" : 2
        },
        "management" : {
          "threads" : 3,
          "queue" : 0,
          "active" : 1,
          "rejected" : 0,
          "largest" : 3,
          "completed" : 41
        },
        "percolate" : {
          "threads" : 0,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 0,
          "completed" : 0
        },
        "refresh" : {
          "threads" : 1,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 1,
          "completed" : 1
        },
        "search" : {
          "threads" : 0,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 0,
          "completed" : 0
        },
        "snapshot" : {
          "threads" : 0,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 0,
          "completed" : 0
        },
        "suggest" : {
          "threads" : 0,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 0,
          "completed" : 0
        },
        "warmer" : {
          "threads" : 1,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 1,
          "completed" : 1
        }
      },
      "fs" : {
        "timestamp" : 1494501231060,
        "total" : {
          "total_in_bytes" : 528311836672,
          "free_in_bytes" : 249557147648,
          "available_in_bytes" : 222696878080
        },
        "data" : [ {
          "path" : "/mnt/esdata2/data/escluster1/nodes/0",
          "mount" : "/mnt/esdata2 (/dev/xvdh1)",
          "type" : "ext4",
          "total_in_bytes" : 528311836672,
          "free_in_bytes" : 249557147648,
          "available_in_bytes" : 222696878080,
          "spins" : "false"
        } ]
      },
      "transport" : {
        "server_open" : 0,
        "rx_count" : 6,
        "rx_size_in_bytes" : 2352,
        "tx_count" : 6,
        "tx_size_in_bytes" : 2352
      },
      "http" : {
        "current_open" : 1,
        "total_opened" : 6
      },
      "breakers" : {
        "request" : {
          "limit_size_in_bytes" : 1281517158,
          "limit_size" : "1.1gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "fielddata" : {
          "limit_size_in_bytes" : 1922275737,
          "limit_size" : "1.7gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.03,
          "tripped" : 0
        },
        "in_flight_requests" : {
          "limit_size_in_bytes" : 3203792896,
          "limit_size" : "2.9gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "parent" : {
          "limit_size_in_bytes" : 2242655027,
          "limit_size" : "2gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        }
      },
      "script" : {
        "compilations" : 0,
        "cache_evictions" : 0
      }
    }
  }
}

1 个答案:

答案 0 :(得分:1)

与您的数据相关联的segments种静态数据(即术语,倒排索引等)非常大 - "memory_in_bytes" : 2272548617为2.11 GB。

这就是为什么你的ES节点在给它2GB堆时无法做任何事情的原因。

除2.11 GB静态数据外,在索引时,搜索当然需要更多内存。因此,您的节点需要至少3GB堆和至少6GB RAM。