在将一些_bulk插入弹性索引后,我遇到了一种奇怪的行为。
假设我有数百万个文档,但是例如我只需要插入227,768个文档。 所以我插入一大块50,000。
为了提高性能,首先使用以下命令禁用刷新间隔:
PUT agg_vit_2016-08-07/_settings
{
"index" : {
"refresh_interval" : -1
}
}
在插入之前,我检查我没有重复ID。
插入批量行的示例:
{ "index" : { "_index" : "agg_vit_2016-08-07", "_type" : "player", "_id" : "0" } }
{"time":"2016-08-07T03:00:00Z","domain":"www.cbs.com","mt_id":472,"cb_id":0,"c_id":0,"master_domain":"472###www.cbs.com","child_domain":"0###www.cbs.com","combo_domain":"0###www.cbs.com","playerrequest":0,"playerload":1,"adrequest":0,"adload":0,"adview":0}
{ "index" : { "_index" : "agg_vit_2016-08-07", "_type" : "player", "_id" : "1" } }
{"time":"2016-08-07T03:00:00Z","domain":"www.thebrick.com","mt_id":478,"cb_id":0,"c_id":0,"master_domain":"478###www.thebrick.com","child_domain":"0###www.thebrick.com","combo_domain":"0###www.thebrick.com","playerrequest":0,"playerload":3,"adrequest":0,"adload":0,"adview":0}
{ "index" : { "_index" : "agg_vit_2016-08-07", "_type" : "player", "_id" : "2" } }
{"time":"2016-08-07T04:00:00Z","domain":"kisshealthandbeauty.com","mt_id":618,"cb_id":0,"c_id":0,"master_domain":"618###kisshealthandbeauty.com","child_domain":"0###kisshealthandbeauty.com","combo_domain":"0###kisshealthandbeauty.com","playerrequest":0,"playerload":20,"adrequest":0,"adload":0,"adview":0}
...
停止插入后再次启用刷新:
PUT agg_vit_2016-08-07/_settings
{
"index" : {
"refresh_interval" : "1ms"
}
}
但是当我查询最终的总计数是错误的时候:
GET agg_vit_2016-08-07/_count
{
"count": 199991,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
}
}
弹性的一些统计数据:
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"_all": {
"primaries": {
"docs": {
"count": 199991,
"deleted": 27766
},
"store": {
"size_in_bytes": 49856669,
"throttle_time_in_millis": 0
},
"indexing": {
"index_total": 227757,
"index_time_in_millis": 26347,
"index_current": 0,
"index_failed": 0,
"delete_total": 0,
"delete_time_in_millis": 0,
"delete_current": 0,
"noop_update_total": 0,
"is_throttled": false,
"throttle_time_in_millis": 0
},....
....
"merges": {
"current": 0,
"current_docs": 0,
"current_size_in_bytes": 0,
"total": 0,
"total_time_in_millis": 0,
"total_docs": 0,
"total_size_in_bytes": 0,
"total_stopped_time_in_millis": 0,
"total_throttled_time_in_millis": 0,
"total_auto_throttle_in_bytes": 209715200
},
当我查询文档时,我丢失了文档。(如输出统计数据所示)
不明白可能是什么问题? ,
为什么我删除了文档"deleted": 27766
,除此之外,还缺少9个文档"count": 199991,
谢谢!
维塔利彼得