我将在声明中说我是弹性搜索的新手,所以这可能有一个简单的答案。到目前为止,我所阅读的内容都没有点击,这样我就可以实现以下目标。
一个非常简化的场景。我有一系列用户活动,如下所示:
timestamp: t0, user: mike, result: failed
timestamp: t1, user: anne, result: failed
timestamp: t2, user: bob, result: success
timestamp: t3, user: tom, result: success
timestamp: t4, user: jane, result: failed
timestamp: t5, user: anne, result: success
timestamp: t6, user: tom, result: failed
timestamp: t7, user: jane, result: failed
timestamp: t8, user: mike, result: success
我需要确定所有必须努力工作以取得成功结果的用户(我忽略那些从未成功的用户)。为此,我真正需要做的就是找到用户在成功之前失败一次或多次的记录。
根据上面的序列,结果是'anne'用户或'mike'用户的记录。
我们忽略'jane',因为没有成功,我们忽略'bob',因为没有失败。我们也会忽略'汤姆',因为他们先成功然后失败 - 这又是一个不同的情况。
我可以在SQL中相对容易地做到这一点,但我很难在弹性搜索中实现这一点。
您如何形成一个回答这个问题的查询?
或者,甚至更好,我怎么能改写我的问题才能达到同样的结果?
谢谢!
答案 0 :(得分:2)
很大的问题。花了一点力气才弄明白,但我设法使用ES 2.0中的新bucket selector aggregation来使用它。
我必须将时间戳更改为"integer"
类型才能使其正常工作(但它也适用于日期)。
我创建了一个简单的索引,并使用_bulk
请求添加了您的数据:
PUT /test_index
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"timestamp": 0,"user":"mike","result":"failed"}
{"index":{"_id":2}}
{"timestamp": 1,"user":"anne","result":"failed"}
{"index":{"_id":3}}
{"timestamp": 2,"user":"bob","result":"success"}
{"index":{"_id":4}}
{"timestamp": 3,"user":"tom","result":"success"}
{"index":{"_id":5}}
{"timestamp": 4,"user":"jane","result":"failed"}
{"index":{"_id":6}}
{"timestamp": 5,"user":"anne","result":"success"}
{"index":{"_id":7}}
{"timestamp": 6,"user":"tom","result":"failed"}
{"index":{"_id":8}}
{"timestamp": 7,"user":"jane","result":"failed"}
{"index":{"_id":9}}
{"timestamp": 8,"user":"mike","result":"success"}
然后我可以通过以下查询获得您所要求的内容(我认为)。在顶级"user_terms"
聚合下,我可以设置三个子聚合:
"failed_filter"
选择具有"result": "failed"
的文档,然后子聚合查找该组中的最大时间戳; "success_filter"
选择包含"result": "success"
的文档,然后子聚合查找 组中的最大时间戳; "failed_lt_success_filter"
仅选择附加到失败值的(最大)时间戳小于附加到成功值的(最大)时间戳的文档。呼。
POST /test_index/_search
{
"size": 0,
"aggregations": {
"user_terms": {
"terms": {
"field": "user"
},
"aggs": {
"failed_filter": {
"filter": { "term": { "result": "failed" } },
"aggs": {
"max_timestamp": { "max": { "field": "timestamp" } }
}
},
"success_filter": {
"filter": { "term": { "result": "success" } },
"aggs": {
"max_timestamp": { "max": { "field": "timestamp" } }
}
},
"failed_lt_success_filter": {
"bucket_selector": {
"buckets_path": {
"failed_timestamp": "failed_filter.max_timestamp",
"success_timestamp": "success_filter.max_timestamp"
},
"script": "failed_timestamp < success_timestamp"
}
}
}
}
}
}
返回:
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 9,
"max_score": 0,
"hits": []
},
"aggregations": {
"user_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "anne",
"doc_count": 2,
"success_filter": {
"doc_count": 1,
"max_timestamp": {
"value": 5
}
},
"failed_filter": {
"doc_count": 1,
"max_timestamp": {
"value": 1
}
}
},
{
"key": "mike",
"doc_count": 2,
"success_filter": {
"doc_count": 1,
"max_timestamp": {
"value": 8
}
},
"failed_filter": {
"doc_count": 1,
"max_timestamp": {
"value": 0
}
}
}
]
}
}
}
以下是我用来解决问题的一些代码:
http://sense.qbox.io/gist/06083e06191445a44610f32baf1dd45c7370401e