我有一个2.4.4集群,其中一个服务器/节点(esnode1)只包含一个带有1个分片和零个副本的220GB索引。
索引响应正常,但每当我干净地重新启动服务器(带有2cpu 4GB RAM 500GB SSD的ec2)时,群集状态会因为" initializing_shards"而停留在红色状态。 = 1,但没有CPU或磁盘使用(系统空闲且没有交换)很长一段时间。
我已将indices.recovery.max_bytes_per_sec提升至50mb,并尝试https://www.elastic.co/guide/en/elasticsearch/guide/current/_rolling_restarts.html处的说明,但未成功。
只有在为ES设置2GB堆时才会出现这种情况。但是,对于3GB堆,群集状态在重新启动后会更改为绿色秒。
我对如何调试或理解这一点感到很茫然,因为日志(下面)看起来很正常,有什么提示吗?
/ cluster / _health是
{
"cluster_name" : "escluster1",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 1,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 0.0
}
这是重启后的日志:
[2017-05-04 15:00:37,975][INFO ][node ] [esnode1] version[2.4.4], pid[2761], build[fcbb46d/2017-01-03T11:33:16Z]
[2017-05-04 15:00:37,976][INFO ][node ] [esnode1] initializing ...
[2017-05-04 15:00:38,534][INFO ][plugins ] [esnode1] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
[2017-05-04 15:00:38,563][INFO ][env ] [esnode1] using [1] data paths, mounts [[/mnt/esdata2 (/dev/xvdh1)]], net usable_space [226.3gb], net total_space [492gb], spins? [no], types [ext4]
[2017-05-04 15:00:38,563][INFO ][env ] [esnode1] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-05-04 15:00:40,379][INFO ][node ] [esnode1] initialized
[2017-05-04 15:00:40,380][INFO ][node ] [esnode1] starting ...
[2017-05-04 15:00:40,501][INFO ][transport ] [esnode1] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2017-05-04 15:00:40,506][INFO ][discovery ] [esnode1] escluster1/sv3aHhUjSyueq5N4_w14mQ
[2017-05-04 15:00:43,565][INFO ][cluster.service ] [esnode1] new_master {esnode1}{sv3aHhUjSyueq5N4_w14mQ}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2017-05-04 15:00:43,595][INFO ][indices.recovery ] [esnode1] updating [indices.recovery.max_bytes_per_sec] from [40mb] to [50mb]
[2017-05-04 15:00:43,631][INFO ][http ] [esnode1] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2017-05-04 15:00:43,632][INFO ][node ] [esnode1] started
[2017-05-04 15:00:43,651][INFO ][gateway ] [esnode1] recovered
编辑1 :将日志级别切换为DEBUG,堆栈为2GB,群集状态保持为"红色"我可以看到每30秒重复记录以下消息:
[2017-05-10 15:58:45,985][DEBUG][index.shard ] [esnode1] [myIndex][0] updateBufferSize: engine is closed; skipping
[2017-05-10 15:59:15,985][DEBUG][indices.memory ] [esnode1] recalculating shard indexing buffer, total is [203.1mb] with [1] active shards, each shard set to indexing=[203.1mb], translog=[64kb]
[2017-05-10 15:59:15,990][DEBUG][index.shard ] [esnode1] [myIndex][0] updateBufferSize: engine is closed; skipping
[2017-05-10 15:59:45,990][DEBUG][indices.memory ] [esnode1] recalculating shard indexing buffer, total is [203.1mb] with [1] active shards, each shard set to indexing=[203.1mb], translog=[64kb]
[2017-05-10 15:59:45,997][DEBUG][index.shard ] [esnode1] [myIndex][0] updateBufferSize: engine is closed; skipping
[2017-05-10 16:00:15,997][DEBUG][indices.memory ] [esnode1] recalculating shard indexing buffer, total is [203.1mb] with [1] active shards, each shard set to indexing=[203.1mb], translog=[64kb]
编辑2 :使用3GB堆生成的输出和#34;绿色"状态:
_nodes / stats?filter_path = ** .indexs.segments:
{
"nodes" : {
"TeXgE1QKSMOE1xYS-miJug" : {
"indices" : {
"segments" : {
"count" : 73,
"memory_in_bytes" : 2272548617,
"terms_memory_in_bytes" : 2269433701,
"stored_fields_memory_in_bytes" : 3103096,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 4672,
"doc_values_memory_in_bytes" : 7148,
"index_writer_memory_in_bytes" : 0,
"index_writer_max_memory_in_bytes" : 320379289,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0
}
}
}
}
/ _节点/统计/ JVM?filter_path = **。heap_used_in_bytes
{
"cluster_name" : "escluster1",
"nodes" : {
"TeXgE1QKSMOE1xYS-miJug" : {
"timestamp" : 1494501231058,
"name" : "esnode1",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : [ "127.0.0.1:9300", "NONE" ],
"indices" : {
"docs" : {
"count" : 5352169,
"deleted" : 0
},
"store" : {
"size_in_bytes" : 234847391460,
"throttle_time_in_millis" : 0
},
"indexing" : {
"index_total" : 0,
"index_time_in_millis" : 0,
"index_current" : 0,
"index_failed" : 0,
"delete_total" : 0,
"delete_time_in_millis" : 0,
"delete_current" : 0,
"noop_update_total" : 0,
"is_throttled" : false,
"throttle_time_in_millis" : 0
},
"get" : {
"total" : 0,
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"open_contexts" : 0,
"query_total" : 0,
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time_in_millis" : 0,
"fetch_current" : 0,
"scroll_total" : 0,
"scroll_time_in_millis" : 0,
"scroll_current" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size_in_bytes" : 0,
"total" : 0,
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size_in_bytes" : 0,
"total_stopped_time_in_millis" : 0,
"total_throttled_time_in_millis" : 0,
"total_auto_throttle_in_bytes" : 20971520
},
"refresh" : {
"total" : 1,
"total_time_in_millis" : 14
},
"flush" : {
"total" : 1,
"total_time_in_millis" : 10
},
"warmer" : {
"current" : 0,
"total" : 3,
"total_time_in_millis" : 6
},
"query_cache" : {
"memory_size_in_bytes" : 0,
"total_count" : 0,
"hit_count" : 0,
"miss_count" : 0,
"cache_size" : 0,
"cache_count" : 0,
"evictions" : 0
},
"fielddata" : {
"memory_size_in_bytes" : 0,
"evictions" : 0
},
"percolate" : {
"total" : 0,
"time_in_millis" : 0,
"current" : 0,
"memory_size_in_bytes" : -1,
"memory_size" : "-1b",
"queries" : 0
},
"completion" : {
"size_in_bytes" : 0
},
"segments" : {
"count" : 73,
"memory_in_bytes" : 2272548617,
"terms_memory_in_bytes" : 2269433701,
"stored_fields_memory_in_bytes" : 3103096,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 4672,
"doc_values_memory_in_bytes" : 7148,
"index_writer_memory_in_bytes" : 0,
"index_writer_max_memory_in_bytes" : 512000,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0
},
"translog" : {
"operations" : 0,
"size_in_bytes" : 43
},
"suggest" : {
"total" : 0,
"time_in_millis" : 0,
"current" : 0
},
"request_cache" : {
"memory_size_in_bytes" : 0,
"evictions" : 0,
"hit_count" : 0,
"miss_count" : 0
},
"recovery" : {
"current_as_source" : 0,
"current_as_target" : 0,
"throttle_time_in_millis" : 0
}
},
"os" : {
"timestamp" : 1494501231060,
"cpu_percent" : 0,
"load_average" : 0.0,
"mem" : {
"total_in_bytes" : 4142092288,
"free_in_bytes" : 117051392,
"used_in_bytes" : 4025040896,
"free_percent" : 3,
"used_percent" : 97
},
"swap" : {
"total_in_bytes" : 0,
"free_in_bytes" : 0,
"used_in_bytes" : 0
}
},
"process" : {
"timestamp" : 1494501231060,
"open_file_descriptors" : 203,
"max_file_descriptors" : 65536,
"cpu" : {
"percent" : 0,
"total_in_millis" : 14890
},
"mem" : {
"total_virtual_in_bytes" : 23821713408
}
},
"jvm" : {
"timestamp" : 1494501231060,
"uptime_in_millis" : 369041,
"mem" : {
"heap_used_in_bytes" : 2323777096,
"heap_used_percent" : 72,
"heap_committed_in_bytes" : 3203792896,
"heap_max_in_bytes" : 3203792896,
"non_heap_used_in_bytes" : 52525744,
"non_heap_committed_in_bytes" : 53305344,
"pools" : {
"young" : {
"used_in_bytes" : 121416432,
"max_in_bytes" : 139591680,
"peak_used_in_bytes" : 139591680,
"peak_max_in_bytes" : 139591680
},
"survivor" : {
"used_in_bytes" : 4653304,
"max_in_bytes" : 17432576,
"peak_used_in_bytes" : 17432576,
"peak_max_in_bytes" : 17432576
},
"old" : {
"used_in_bytes" : 2197707360,
"max_in_bytes" : 3046768640,
"peak_used_in_bytes" : 2197707360,
"peak_max_in_bytes" : 3046768640
}
}
},
"threads" : {
"count" : 34,
"peak_count" : 42
},
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 23,
"collection_time_in_millis" : 1027
},
"old" : {
"collection_count" : 1,
"collection_time_in_millis" : 26
}
}
},
"buffer_pools" : {
"direct" : {
"count" : 24,
"used_in_bytes" : 3964472,
"total_capacity_in_bytes" : 3964472
},
"mapped" : {
"count" : 33,
"used_in_bytes" : 18005744733,
"total_capacity_in_bytes" : 18005744733
}
},
"classes" : {
"current_loaded_count" : 7490,
"total_loaded_count" : 7490,
"total_unloaded_count" : 0
}
},
"thread_pool" : {
"bulk" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"fetch_shard_started" : {
"threads" : 1,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 1,
"completed" : 1
},
"fetch_shard_store" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"flush" : {
"threads" : 1,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 1,
"completed" : 2
},
"force_merge" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"generic" : {
"threads" : 1,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 5,
"completed" : 69
},
"get" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"index" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"listener" : {
"threads" : 1,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 1,
"completed" : 2
},
"management" : {
"threads" : 3,
"queue" : 0,
"active" : 1,
"rejected" : 0,
"largest" : 3,
"completed" : 41
},
"percolate" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"refresh" : {
"threads" : 1,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 1,
"completed" : 1
},
"search" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"snapshot" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"suggest" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"warmer" : {
"threads" : 1,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 1,
"completed" : 1
}
},
"fs" : {
"timestamp" : 1494501231060,
"total" : {
"total_in_bytes" : 528311836672,
"free_in_bytes" : 249557147648,
"available_in_bytes" : 222696878080
},
"data" : [ {
"path" : "/mnt/esdata2/data/escluster1/nodes/0",
"mount" : "/mnt/esdata2 (/dev/xvdh1)",
"type" : "ext4",
"total_in_bytes" : 528311836672,
"free_in_bytes" : 249557147648,
"available_in_bytes" : 222696878080,
"spins" : "false"
} ]
},
"transport" : {
"server_open" : 0,
"rx_count" : 6,
"rx_size_in_bytes" : 2352,
"tx_count" : 6,
"tx_size_in_bytes" : 2352
},
"http" : {
"current_open" : 1,
"total_opened" : 6
},
"breakers" : {
"request" : {
"limit_size_in_bytes" : 1281517158,
"limit_size" : "1.1gb",
"estimated_size_in_bytes" : 0,
"estimated_size" : "0b",
"overhead" : 1.0,
"tripped" : 0
},
"fielddata" : {
"limit_size_in_bytes" : 1922275737,
"limit_size" : "1.7gb",
"estimated_size_in_bytes" : 0,
"estimated_size" : "0b",
"overhead" : 1.03,
"tripped" : 0
},
"in_flight_requests" : {
"limit_size_in_bytes" : 3203792896,
"limit_size" : "2.9gb",
"estimated_size_in_bytes" : 0,
"estimated_size" : "0b",
"overhead" : 1.0,
"tripped" : 0
},
"parent" : {
"limit_size_in_bytes" : 2242655027,
"limit_size" : "2gb",
"estimated_size_in_bytes" : 0,
"estimated_size" : "0b",
"overhead" : 1.0,
"tripped" : 0
}
},
"script" : {
"compilations" : 0,
"cache_evictions" : 0
}
}
}
}
答案 0 :(得分:1)
与您的数据相关联的segments
种静态数据(即术语,倒排索引等)非常大 - "memory_in_bytes" : 2272548617
为2.11 GB。
这就是为什么你的ES节点在给它2GB堆时无法做任何事情的原因。
除2.11 GB静态数据外,在索引时,搜索当然需要更多内存。因此,您的节点需要至少3GB堆和至少6GB RAM。