我们使用ElasticSearch根据5个字段查找商品,例如某些“自由文本”,商品状态和客户名称。我们还需要在两个字段上聚合客户端名称和提供状态。因此,当有人输入一些自由文本时,我们发现10个状态为关闭的文档和8个状态为打开的文档,“状态过滤器”应该包含已关闭(10)和打开(8)。
现在的问题是,当我选择要包含在过滤器中的“已关闭”状态时,打开的聚合结果会更改为0.我希望它保持为8.那么如何防止聚合上的过滤器影响聚合本身?
这是第一个查询'java'的查询:
{
"query": {
"bool": {
"filter": [
],
"must": {
"simple_query_string": {
"query" : "java"
}
}
}
},
"aggs": {
"OFFER_STATE_F": {
"terms": {
"size": 0,
"field": "offer_state_f",
"min_doc_count": 0
}
}
},
"from": 0,
"size": 1,
"fields": ["offer_id_ft", "offer_state_f"]
}
结果如下:
{
"hits": {
"total": 960,
"max_score": 0.89408284000000005,
"hits": [
{
"_type": "offer",
"_index": "select",
"_id": "40542",
"fields": {
"offer_id_ft": [
"40542"
],
"offer_state_f": [
"REJECTED"
]
},
"_score": 0.89408284000000005
}
]
},
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"timed_out": false,
"aggregations": {
"OFFER_STATE_F": {
"buckets": [
{
"key": "REJECTED",
"doc_count": 778
},
{
"key": "ACCEPTED",
"doc_count": 130
},
{
"key": "CANCELED",
"doc_count": 22
},
{
"key": "WITHDRAWN",
"doc_count": 13
},
{
"key": "LONGLIST",
"doc_count": 12
},
{
"key": "SHORTLIST",
"doc_count": 5
},
{
"key": "INTAKE",
"doc_count": 0
}
],
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0
}
},
"took": 2
}
如您所见,client_state_f存储桶的总和等于总命中数(960)。现在,我在查询中包含一个状态,说“接受”。所以我的查询变为:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"term": {
"offer_state_f": "ACCEPTED"
}
}
]
}
}
],
"must": {
"simple_query_string": {
"query" : "java"
}
}
}
},
"aggs": {
"OFFER_STATE_F": {
"terms": {
"size": 0,
"field": "offer_state_f",
"min_doc_count": 0
}
}
},
"from": 0,
"size": 1,
"fields": ["offer_id_ft", "offer_state_f"]
}
我想要的是130个结果,但是client_state_f仍然需要总计达到960个。但我得到的是:
{
"hits": {
"total": 130,
"max_score": 0.89408284000000005,
"hits": [
{
"_type": "offer",
"_index": "select",
"_id": "16884",
"fields": {
"offer_id_ft": [
"16884"
],
"offer_state_f": [
"ACCEPTED"
]
},
"_score": 0.89408284000000005
}
]
},
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"timed_out": false,
"aggregations": {
"OFFER_STATE_F": {
"buckets": [
{
"key": "ACCEPTED",
"doc_count": 130
},
{
"key": "CANCELED",
"doc_count": 0
},
{
"key": "INTAKE",
"doc_count": 0
},
{
"key": "LONGLIST",
"doc_count": 0
},
{
"key": "REJECTED",
"doc_count": 0
},
{
"key": "SHORTLIST",
"doc_count": 0
},
{
"key": "WITHDRAWN",
"doc_count": 0
}
],
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0
}
},
"took": 10
}
如您所见,只有ACCEPTED存储桶被填充,其他所有存储桶都为0。
答案 0 :(得分:1)
您需要将过滤器移到post_filter
部分而不是query
部分。
这样,过滤将在计算聚合后应用,并且您将能够聚合整个数据集,但只能获得与过滤器匹配的结果匹配。
答案 1 :(得分:0)
好的,我在一位同事的帮助下找到答案,事实是,Val i是对的。为他+1。我所做的是将所有查询过滤器放在post_filter中,这就是问题所在。我只需要在post_filter中为我想要聚集的字段放置过滤器。因此:
{
"query": {
"bool": {
"filter": [
{
"term": {
"broker_f": "false"
}
}
],
"must": {
"simple_query_string": {
"query" : "java"
}
}
}
},
"aggs": {
"OFFER_STATE_F": {
"terms": {
"size": 0,
"field": "offer_state_f",
"min_doc_count": 0
}
}
},
"post_filter" : {
"bool": {
"should": [
{
"term": {
"offer_state_f": "SHORTLIST"
}
}
]
}
},
"from": 0,
"size": 1,
"fields": ["offer_id_ft", "offer_state_f"]
}
现在结果是正确的:
{
"hits": {
"total": 5,
"max_score": 0.76667790000000002,
"hits": [
{
"_type": "offer",
"_index": "select",
"_id": "24454",
"fields": {
"offer_id_ft": [
"24454"
],
"offer_state_f": [
"SHORTLIST"
]
},
"_score": 0.76667790000000002
}
]
},
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"timed_out": false,
"aggregations": {
"OFFER_STATE_F": {
"buckets": [
{
"key": "REJECTED",
"doc_count": 777
},
{
"key": "ACCEPTED",
"doc_count": 52
},
{
"key": "CANCELED",
"doc_count": 22
},
{
"key": "LONGLIST",
"doc_count": 12
},
{
"key": "WITHDRAWN",
"doc_count": 12
},
{
"key": "SHORTLIST",
"doc_count": 5
},
{
"key": "INTAKE",
"doc_count": 0
}
],
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0
}
},
"took": 4
}