Question

我在elasticsearch中有数据，其记录如下：

data    
    {
      start: 20,
      userid: "123",
    },
    {
      start: 34,
      userid: "234",
    },
    {
      start: 8,
      userid: "123",
    },
    {
      start: 12,
      userid: "234",
    },
    {
      start: 18,
      userid: "345",
    }

＆＃34;开始＆＃34;是一个很长的（时间的衡量）和＆＃34;用户ID＆＃34;是一个字符串。该数据包含数百万用户，该用户拥有同一用户的多条记录。

Question:

我需要所有拥有第一条记录（基于＆＃39; start＆＃39;排序）的用户ID位于时间t1和t2之间，例如：在10到15之间。

For userid 123, sorted times are: {8, 20}
For userid 234, sorted times are: {12, 34}
For userid 345, sorted times are: {18}

这就是为什么它应该只返回userid＆＃34; 234＆＃34;，因为只是对于这个用户，时间数组中的第一个条目（已排序）在10到15之间。

Answer
234

Answer 1

您可以使用ES 2.0中的新bucket selector aggregation执行此操作。

为了对它进行测试，我使用您提供的数据设置了一个简单的索引（我添加了一些以明确聚合正在运行）：

DELETE /test_index

POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"start":20,"userid":"123"}
{"index":{"_id":2}}
{"start":34,"userid":"234"}
{"index":{"_id":3}}
{"start":8,"userid":"123"}
{"index":{"_id":4}}
{"start":12,"userid":"234"}
{"index":{"_id":5}}
{"start":18,"userid":"345"}
{"index":{"_id":6}}
{"start":8,"userid":"555"}
{"index":{"_id":7}}
{"start":12,"userid":"555"}

然后我可以通过以下查询获得您想要的内容：

POST /test_index/_search
{
   "size": 0,
   "aggs": {
      "userid_terms": {
         "terms": {
            "field": "userid"
         },
         "aggs": {
            "min_start": {
               "min": {
                  "field": "start"
               }
            },
            "min_start_filter": {
               "bucket_selector": {
                  "buckets_path": {
                     "min_start": "min_start"
                  },
                  "script": "min_start >= 10 && min_start <= 15"
               }
            }
         }
      }
   }
}

返回：

{
   "took": 7,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 7,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "userid_terms": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "234",
               "doc_count": 2,
               "min_start": {
                  "value": 12
               }
            }
         ]
      }
   }
}

以下是我用来测试它的代码：

http://sense.qbox.io/gist/7427b87e878c23ce03bac199d6975434d66046f9

ElasticSearch中的聚合时间序列查询

1 个答案: