在Elasticsearch中,我将项目状态快照存储在仅追加方案中。 例如:
POST /item/item
{
"id": "1",
"time": "2018-09-19T00:00:00Z",
status": "ON_HOLD"
}
POST /item/item
{
"id": "2",
"time": "2018-09-19T00:01:00Z",
"status": "ON_HOLD"
}
POST /item/item
{
"id": "2",
"time": "2018-09-19T00:02:00Z",
"status": "DONE"
}
现在,我希望实现的是回答以下问题:哪些项目仍处于保留状态?(status==ON_HOLD
)。
在这种简单情况下,答案将是:
{
"id": "1",
"time": "2018-09-19T00:00:00Z",
status": "ON_HOLD"
}
因此,为了获得项目的最后状态,我在id
上使用了术语聚合,如下所示:
GET /item/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"id": {
"terms": {
"field": "id.keyword",
"size": 10
},
"aggs": {
"top_items": {
"top_hits": {
"size": 1,
"sort": [
{
"time": {
"order": "desc"
}
}
],
"_source": {
"includes": ["*"]
}
}
}
}
}
}
}
这为我提供了由ID标识的每个项目的最后可用状态:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "2",
"doc_count": 2,
"top_items": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "item",
"_type": "item",
"_id": "S-5eCGYBNyILygyml2jR",
"_score": null,
"_source": {
"id": "2",
"time": "2018-09-19T00:02:00Z",
"status": "DONE"
},
"sort": [
1537315320000
]
}
]
}
}
},
{
"key": "1",
"doc_count": 1,
"top_items": {
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "item",
"_type": "item",
"_id": "Se5eCGYBNyILygymjmg0",
"_score": null,
"_source": {
"id": "1",
"time": "2018-09-19T00:00:00Z",
"status": "ON_HOLD"
},
"sort": [
1537315200000
]
}
]
}
}
}
]
}
}
}
现在的问题是,我想在Elasticsearch端(而不是客户端)过滤结果(聚合后)。
我尝试了bucket_selector
聚合,但是它抱怨,因为top_hits
的结果不是数字或单值数字聚合。
我还尝试添加一个script_field来获取数字值,但之后似乎无法使用它:
"script_fields": {
"on_hold": {
"script": {
"lang": "painless",
"source": "doc['status.keyword'].value == 'ON_HOLD' ? 1 : 0"
}
}
}
我想在Elasticsearch方面做什么还是必须在客户端做?
PS:在聚合之前添加过滤器不会提供正确的结果,因为它将返回在任何时间点都是ON_HOLD
的项目。
编辑: 好吧,我到了:
GET /item/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"id": {
"terms": {
"field": "id.keyword",
"size": 50
},
"aggs": {
"top_item": {
"terms": {
"size": 1,
"field": "time",
"order": {
"_key": "desc"
}
},
"aggs": {
"on_hold": {
"filter": {
"term": {
"status.keyword": "ON_HOLD"
}
},
"aggs": {
"document": {
"top_hits": {
"size": 1,
"_source": ["*"]
}
}
}
}
}
}
}
}
}
}
top_hits聚合是一个指标,而不是存储桶聚合,因此它无法完成工作,必须最后使用。
最后一个问题是:过滤出的水桶留下空着的叶子: “点击数”:[]
有什么方法可以从结果树中删除以空叶结尾的分支吗?谢谢
答案 0 :(得分:0)
好的,我找到了解决该问题的完整方法,包括过滤掉聚合树中的空分支:
GET /item/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"id": {
"terms": {
"field": "id.keyword",
"size": 50
},
"aggs": {
"top_item": {
"terms": {
"size": 1,
"field": "time",
"order": {
"_key": "desc"
}
},
"aggs": {
"on_hold": {
"filter": {
"term": {
"status.keyword": "ON_HOLD"
}
},
"aggs": {
"document": {
"top_hits": {
"size": 1,
"_source": ["*"]
}
}
}
},
"remove_filtered": {
"bucket_selector": {
"buckets_path": {
"count": "on_hold._count"
},
"script": {
"source": "params.count != 0"
}
}
}
}
},
"remove_empty": {
"bucket_selector": {
"buckets_path": {
"count": "top_item._bucket_count"
},
"script": "params.count != 0"
}
}
}
}
}
}
这给出了预期的以下输出:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1,
"top_item": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1537315200000,
"key_as_string": "2018-09-19T00:00:00.000Z",
"doc_count": 1,
"on_hold": {
"doc_count": 1,
"document": {
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "item",
"_type": "item",
"_id": "HvywM2YB5Ei0wOZMeia9",
"_score": 1,
"_source": {
"id": "1",
"time": "2018-09-19T00:00:00Z",
"status": "ON_HOLD"
}
}
]
}
}
}
}
]
}
}
]
}
}
}