Elasticsearch在批量插入

时间:2016-08-10 06:01:49

标签: elasticsearch indexing lucene bulkinsert elasticsearch-plugin

在将一些_bulk插入弹性索引后,我遇到了一种奇怪的行为。

假设我有数百万个文档,但是例如我只需要插入227,768个文档。 所以我插入一大块50,000。

为了提高性能,首先使用以下命令禁用刷新间隔:

PUT agg_vit_2016-08-07/_settings
{
  "index" : {
        "refresh_interval" : -1
    }
}

在插入之前,我检查我没有重复ID。

插入批量行的示例:

{ "index" : { "_index" : "agg_vit_2016-08-07", "_type" : "player", "_id" : "0" } }
    {"time":"2016-08-07T03:00:00Z","domain":"www.cbs.com","mt_id":472,"cb_id":0,"c_id":0,"master_domain":"472###www.cbs.com","child_domain":"0###www.cbs.com","combo_domain":"0###www.cbs.com","playerrequest":0,"playerload":1,"adrequest":0,"adload":0,"adview":0}
    { "index" : { "_index" : "agg_vit_2016-08-07", "_type" : "player", "_id" : "1" } }
    {"time":"2016-08-07T03:00:00Z","domain":"www.thebrick.com","mt_id":478,"cb_id":0,"c_id":0,"master_domain":"478###www.thebrick.com","child_domain":"0###www.thebrick.com","combo_domain":"0###www.thebrick.com","playerrequest":0,"playerload":3,"adrequest":0,"adload":0,"adview":0}
    { "index" : { "_index" : "agg_vit_2016-08-07", "_type" : "player", "_id" : "2" } }
    {"time":"2016-08-07T04:00:00Z","domain":"kisshealthandbeauty.com","mt_id":618,"cb_id":0,"c_id":0,"master_domain":"618###kisshealthandbeauty.com","child_domain":"0###kisshealthandbeauty.com","combo_domain":"0###kisshealthandbeauty.com","playerrequest":0,"playerload":20,"adrequest":0,"adload":0,"adview":0} 

...

停止插入后

再次启用刷新:

PUT agg_vit_2016-08-07/_settings
    {
      "index" : {
            "refresh_interval" : "1ms"
        }
    }

但是当我查询最终的总计数是错误的时候:

GET agg_vit_2016-08-07/_count
{
  "count": 199991,
  "_shards": {
    "total": 10,
    "successful": 10,
    "failed": 0
  }
}

弹性的一些统计数据:

  "_shards": {
    "total": 10,
    "successful": 10,
    "failed": 0
  },
  "_all": {
    "primaries": {
      "docs": {
        "count": 199991,
        "deleted": 27766
      },
      "store": {
        "size_in_bytes": 49856669,
        "throttle_time_in_millis": 0
      },
      "indexing": {
        "index_total": 227757,
        "index_time_in_millis": 26347,
        "index_current": 0,
        "index_failed": 0,
        "delete_total": 0,
        "delete_time_in_millis": 0,
        "delete_current": 0,
        "noop_update_total": 0,
        "is_throttled": false,
        "throttle_time_in_millis": 0
      },....
      ....
 "merges": {
    "current": 0,
    "current_docs": 0,
    "current_size_in_bytes": 0,
    "total": 0,
    "total_time_in_millis": 0,
    "total_docs": 0,
    "total_size_in_bytes": 0,
    "total_stopped_time_in_millis": 0,
    "total_throttled_time_in_millis": 0,
    "total_auto_throttle_in_bytes": 209715200
  },

我的问题

当我查询文档时,我丢失了文档。(如输出统计数据所示)

不明白可能是什么问题? ,

为什么我删除了文档"deleted": 27766,除此之外,还缺少9个文档"count": 199991,

谢谢!

维塔利彼得

0 个答案:

没有答案