ElasticSearch中的聚合时间序列查询

时间:2015-12-01 21:30:45

标签: elasticsearch

我在elasticsearch中有数据,其记录如下:

data    
    {
      start: 20,
      userid: "123",
    },
    {
      start: 34,
      userid: "234",
    },
    {
      start: 8,
      userid: "123",
    },
    {
      start: 12,
      userid: "234",
    },
    {
      start: 18,
      userid: "345",
    }

"开始"是一个很长的(时间的衡量)和"用户ID"是一个字符串。该数据包含数百万用户,该用户拥有同一用户的多条记录。

Question:

我需要所有拥有第一条记录(基于' start'排序)的用户ID位于时间t1和t2之间,例如:在10到15之间。

For userid 123, sorted times are: {8, 20}
For userid 234, sorted times are: {12, 34}
For userid 345, sorted times are: {18}

这就是为什么它应该只返回userid" 234",因为只是对于这个用户,时间数组中的第一个条目(已排序)在10到15之间。

Answer
234

1 个答案:

答案 0 :(得分:0)

您可以使用ES 2.0中的新bucket selector aggregation执行此操作。

为了对它进行测试,我使用您提供的数据设置了一个简单的索引(我添加了一些以明确聚合正在运行):

DELETE /test_index

POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"start":20,"userid":"123"}
{"index":{"_id":2}}
{"start":34,"userid":"234"}
{"index":{"_id":3}}
{"start":8,"userid":"123"}
{"index":{"_id":4}}
{"start":12,"userid":"234"}
{"index":{"_id":5}}
{"start":18,"userid":"345"}
{"index":{"_id":6}}
{"start":8,"userid":"555"}
{"index":{"_id":7}}
{"start":12,"userid":"555"}

然后我可以通过以下查询获得您想要的内容:

POST /test_index/_search
{
   "size": 0,
   "aggs": {
      "userid_terms": {
         "terms": {
            "field": "userid"
         },
         "aggs": {
            "min_start": {
               "min": {
                  "field": "start"
               }
            },
            "min_start_filter": {
               "bucket_selector": {
                  "buckets_path": {
                     "min_start": "min_start"
                  },
                  "script": "min_start >= 10 && min_start <= 15"
               }
            }
         }
      }
   }
}

返回:

{
   "took": 7,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 7,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "userid_terms": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "234",
               "doc_count": 2,
               "min_start": {
                  "value": 12
               }
            }
         ]
      }
   }
}

以下是我用来测试它的代码:

http://sense.qbox.io/gist/7427b87e878c23ce03bac199d6975434d66046f9