我在elasticsearch中有数据,其记录如下:
data
{
start: 20,
userid: "123",
},
{
start: 34,
userid: "234",
},
{
start: 8,
userid: "123",
},
{
start: 12,
userid: "234",
},
{
start: 18,
userid: "345",
}
"开始"是一个很长的(时间的衡量)和"用户ID"是一个字符串。该数据包含数百万用户,该用户拥有同一用户的多条记录。
Question:
我需要所有拥有第一条记录(基于' start'排序)的用户ID位于时间t1和t2之间,例如:在10到15之间。
For userid 123, sorted times are: {8, 20}
For userid 234, sorted times are: {12, 34}
For userid 345, sorted times are: {18}
这就是为什么它应该只返回userid" 234",因为只是对于这个用户,时间数组中的第一个条目(已排序)在10到15之间。
Answer
234
答案 0 :(得分:0)
您可以使用ES 2.0中的新bucket selector aggregation执行此操作。
为了对它进行测试,我使用您提供的数据设置了一个简单的索引(我添加了一些以明确聚合正在运行):
DELETE /test_index
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"start":20,"userid":"123"}
{"index":{"_id":2}}
{"start":34,"userid":"234"}
{"index":{"_id":3}}
{"start":8,"userid":"123"}
{"index":{"_id":4}}
{"start":12,"userid":"234"}
{"index":{"_id":5}}
{"start":18,"userid":"345"}
{"index":{"_id":6}}
{"start":8,"userid":"555"}
{"index":{"_id":7}}
{"start":12,"userid":"555"}
然后我可以通过以下查询获得您想要的内容:
POST /test_index/_search
{
"size": 0,
"aggs": {
"userid_terms": {
"terms": {
"field": "userid"
},
"aggs": {
"min_start": {
"min": {
"field": "start"
}
},
"min_start_filter": {
"bucket_selector": {
"buckets_path": {
"min_start": "min_start"
},
"script": "min_start >= 10 && min_start <= 15"
}
}
}
}
}
}
返回:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0,
"hits": []
},
"aggregations": {
"userid_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "234",
"doc_count": 2,
"min_start": {
"value": 12
}
}
]
}
}
}
以下是我用来测试它的代码:
http://sense.qbox.io/gist/7427b87e878c23ce03bac199d6975434d66046f9