我试图将数据批量插入到包含3个数据节点的4节点elasticsearch集群中。
数据节点规格: 16 CPU - 7GB RAM - 500GB SSD
将数据插入非数据节点并在5个分片上拆分并设置为1个重复。 要插入大约250GB的数据。
然而,在每个节点上插入大约40GB的数据并且在整个时间跨度内有大约60%的CPU和大约30%的RAM使用率处理一小时的处理后,一些分片处于初始化状态:
~$ curl -XGET 'http://localhost:9200/_cluster/health/osm?level=shards&pretty=true'
{
"cluster_name" : "elastic_osm",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 3,
"active_primary_shards" : 5,
"active_shards" : 9,
"relocating_shards" : 1,
"initializing_shards" : 1,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"indices" : {
"osm" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 9,
"relocating_shards" : 1,
"initializing_shards" : 1,
"unassigned_shards" : 0,
"shards" : {
"0" : {
"status" : "yellow",
"primary_active" : true,
"active_shards" : 1,
"relocating_shards" : 0,
"initializing_shards" : 1,
"unassigned_shards" : 0
},
"1" : {
"status" : "green",
"primary_active" : true,
"active_shards" : 2,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
},
"2" : {
"status" : "green",
"primary_active" : true,
"active_shards" : 2,
"relocating_shards" : 1,
"initializing_shards" : 0,
"unassigned_shards" : 0
},
"3" : {
"status" : "green",
"primary_active" : true,
"active_shards" : 2,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
},
"4" : {
"status" : "green",
"primary_active" : true,
"active_shards" : 2,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
}
}
}
}
深入挖掘,我发现一个节点的堆空间有问题:
~$ curl -XGET 'localhost:9200/osm/_search_shards?pretty=true'
{
"nodes" : {
"1DpvDUf7SKywJrBgQqs9eg" : {
"name" : "elastic-osm-node-1",
"transport_address" : "inet[/xxx.xxx.x.x:xxxx]",
"attributes" : {
"master" : "true"
}
},
"FiBYw-v_QfO3nJQfHflf_w" : {
"name" : "elastic-osm-node-3",
"transport_address" : "inet[/xxx.xxx.x.x:x]",
"attributes" : {
"master" : "true"
}
},
"ibpt8lGiS6yDJf4e09RN9Q" : {
"name" : "elastic-osm-node-2",
"transport_address" : "inet[/xxx.xxx.x.x:xxxx]",
"attributes" : {
"master" : "true"
}
}
},
"shards" : [ [ {
"state" : "STARTED",
"primary" : true,
"node" : "ibpt8lGiS6yDJf4e09RN9Q",
"relocating_node" : null,
"shard" : 0,
"index" : "osm"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "FiBYw-v_QfO3nJQfHflf_w",
"relocating_node" : null,
"shard" : 0,
"index" : "osm",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2015-10-30T10:42:25.539Z",
"details" : "shard failure [engine failure, reason [already closed by tragic event]][OutOfMemoryError[Java heap space]]"
}
} ], [ {
"state" : "STARTED",
"primary" : true,
"node" : "FiBYw-v_QfO3nJQfHflf_w",
"relocating_node" : null,
"shard" : 1,
"index" : "osm"
}, {
"state" : "STARTED",
"primary" : false,
"node" : "1DpvDUf7SKywJrBgQqs9eg",
"relocating_node" : null,
"shard" : 1,
"index" : "osm"
} ], [ {
"state" : "RELOCATING",
"primary" : false,
"node" : "FiBYw-v_QfO3nJQfHflf_w",
"relocating_node" : "1DpvDUf7SKywJrBgQqs9eg",
"shard" : 2,
"index" : "osm"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "ibpt8lGiS6yDJf4e09RN9Q",
"relocating_node" : null,
"shard" : 2,
"index" : "osm"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "1DpvDUf7SKywJrBgQqs9eg",
"relocating_node" : "FiBYw-v_QfO3nJQfHflf_w",
"shard" : 2,
"index" : "osm"
} ], [ {
"state" : "STARTED",
"primary" : false,
"node" : "FiBYw-v_QfO3nJQfHflf_w",
"relocating_node" : null,
"shard" : 3,
"index" : "osm"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "1DpvDUf7SKywJrBgQqs9eg",
"relocating_node" : null,
"shard" : 3,
"index" : "osm"
} ], [ {
"state" : "STARTED",
"primary" : false,
"node" : "ibpt8lGiS6yDJf4e09RN9Q",
"relocating_node" : null,
"shard" : 4,
"index" : "osm"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "FiBYw-v_QfO3nJQfHflf_w",
"relocating_node" : null,
"shard" : 4,
"index" : "osm"
} ] ]
}
然而,服务器上设置的ES_HEAP_SIZE是ram的一半:
~$ echo $ES_HEAP_SIZE
7233.0m
,使用量仅为5g:
~$ free -g
total used
Mem: 14 5
如果我再等一下,节点完全离开集群,所有副本都进入初始化状态,这使我的插入失败并停止:
{
"state" : "INITIALIZING",
"primary" : false,
"node" : "ibpt8lGiS6yDJf4e09RN9Q",
"relocating_node" : null,
"shard" : 3,
"index" : "osm",
"unassigned_info" : {
"reason" : "NODE_LEFT",
"at" : "2015-10-30T10:53:32.044Z",
"details" : "node_left[FiBYw-v_QfO3nJQfHflf_w]"
}
Conf:为了加速插入,我在数据节点elasticsearch配置
上使用这些参数refresh_interval:-1, threadpool.bulk.size:16, threadpool.bulk.queue_size:1000
为什么会这样?我该如何解决这个问题并使我的批量插入成功? 对于最大堆大小,我是否需要超过50%的RAM?
编辑:由于调整弹性搜索参数并不好,我删除了线程池参数,但它工作得非常慢。 Elasticsearch并非旨在过快地摄取太多数据。答案 0 :(得分:0)
删除这些设置:
threadpool.bulk.size: 16
threadpool.bulk.queue_size: 1000
这些设置的默认值应该足够好,不会使群集过载。
并确保按照说明here正确调整批量索引编制过程的大小。根据群集/数据,批量需要具有一定的大小。对于那些希望尽可能多地摄取的人,你不能使用你想要的任何值。每个群集都有局限性,你应该测试你的。