ElasticSearch会对聚合进行过滤,而不会影响聚合计数

时间:2016-02-23 12:15:09

标签: elasticsearch filter aggregation

我们使用ElasticSearch根据5个字段查找商品,例如某些“自由文本”,商品状态和客户名称。我们还需要在两个字段上聚合客户端名称和提供状态。因此,当有人输入一些自由文本时,我们发现10个状态为关闭的文档和8个状态为打开的文档,“状态过滤器”应该包含已关闭(10)和打开(8)。

现在的问题是,当我选择要包含在过滤器中的“已关闭”状态时,打开的聚合结果会更改为0.我希望它保持为8.那么如何防止聚合上的过滤器影响聚合本身?

这是第一个查询'java'的查询:

{
    "query": {
        "bool": {
            "filter": [
            ],
            "must": {
                "simple_query_string": {
                    "query" : "java"
                }
            }
        }
    },
    "aggs": {
        "OFFER_STATE_F": {
            "terms": {
                "size": 0,
                "field": "offer_state_f",
                "min_doc_count": 0
            }
        }
    },
    "from": 0,
    "size": 1,
    "fields": ["offer_id_ft", "offer_state_f"]
}

结果如下:

{
  "hits": {
    "total": 960,
    "max_score": 0.89408284000000005,
    "hits": [
      {
        "_type": "offer",
        "_index": "select",
        "_id": "40542",
        "fields": {
          "offer_id_ft": [
            "40542"
          ],
          "offer_state_f": [
            "REJECTED"
          ]
        },
        "_score": 0.89408284000000005
      }
    ]
  },
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "timed_out": false,
  "aggregations": {
    "OFFER_STATE_F": {
      "buckets": [
        {
          "key": "REJECTED",
          "doc_count": 778
        },
        {
          "key": "ACCEPTED",
          "doc_count": 130
        },
        {
          "key": "CANCELED",
          "doc_count": 22
        },
        {
          "key": "WITHDRAWN",
          "doc_count": 13
        },
        {
          "key": "LONGLIST",
          "doc_count": 12
        },
        {
          "key": "SHORTLIST",
          "doc_count": 5
        },
        {
          "key": "INTAKE",
          "doc_count": 0
        }
      ],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    }
  },
  "took": 2
}

如您所见,client_state_f存储桶的总和等于总命中数(960)。现在,我在查询中包含一个状态,说“接受”。所以我的查询变为:

{
    "query": {
        "bool": {
            "filter": [
                {
                    "bool": {
                        "should": [
                            {
                                "term": {
                                    "offer_state_f": "ACCEPTED"
                                }
                            }
                        ]
                    }
                }            
            ],
            "must": {
                "simple_query_string": {
                    "query" : "java"
                }
            }
        }
    },
    "aggs": {
        "OFFER_STATE_F": {
            "terms": {
                "size": 0,
                "field": "offer_state_f",
                "min_doc_count": 0
            }
        }
    },
    "from": 0,
    "size": 1,
    "fields": ["offer_id_ft", "offer_state_f"]
}

我想要的是130个结果,但是client_state_f仍然需要总计达到960个。但我得到的是:

{
  "hits": {
    "total": 130,
    "max_score": 0.89408284000000005,
    "hits": [
      {
        "_type": "offer",
        "_index": "select",
        "_id": "16884",
        "fields": {
          "offer_id_ft": [
            "16884"
          ],
          "offer_state_f": [
            "ACCEPTED"
          ]
        },
        "_score": 0.89408284000000005
      }
    ]
  },
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "timed_out": false,
  "aggregations": {
    "OFFER_STATE_F": {
      "buckets": [
        {
          "key": "ACCEPTED",
          "doc_count": 130
        },
        {
          "key": "CANCELED",
          "doc_count": 0
        },
        {
          "key": "INTAKE",
          "doc_count": 0
        },
        {
          "key": "LONGLIST",
          "doc_count": 0
        },
        {
          "key": "REJECTED",
          "doc_count": 0
        },
        {
          "key": "SHORTLIST",
          "doc_count": 0
        },
        {
          "key": "WITHDRAWN",
          "doc_count": 0
        }
      ],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    }
  },
  "took": 10
}

如您所见,只有ACCEPTED存储桶被填充,其他所有存储桶都为0。

2 个答案:

答案 0 :(得分:1)

您需要将过滤器移到post_filter部分而不是query部分。

这样,过滤将在计算聚合后应用,并且您将能够聚合整个数据集,但只能获得与过滤器匹配的结果匹配。

答案 1 :(得分:0)

好的,我在一位同事的帮助下找到答案,事实是,Val i是对的。为他+1。我所做的是将所有查询过滤器放在post_filter中,这就是问题所在。我只需要在post_filter中为我想要聚集的字段放置过滤器。因此:

{
    "query": {
        "bool": {
            "filter": [
            {
                "term": {
                    "broker_f": "false"
                }
            }
            ],
            "must": {
                "simple_query_string": {
                    "query" : "java"
                }
            }
        }
    },
    "aggs": {
        "OFFER_STATE_F": {
            "terms": {
                "size": 0,
                "field": "offer_state_f",
                "min_doc_count": 0
            }
        }
    },
    "post_filter" : {
        "bool": {
            "should": [
                {
                    "term": {
                        "offer_state_f": "SHORTLIST"
                    }
                }
            ]
        }
    },
    "from": 0,
    "size": 1,
    "fields": ["offer_id_ft", "offer_state_f"]
}

现在结果是正确的:

{
  "hits": {
    "total": 5,
    "max_score": 0.76667790000000002,
    "hits": [
      {
        "_type": "offer",
        "_index": "select",
        "_id": "24454",
        "fields": {
          "offer_id_ft": [
            "24454"
          ],
          "offer_state_f": [
            "SHORTLIST"
          ]
        },
        "_score": 0.76667790000000002
      }
    ]
  },
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "timed_out": false,
  "aggregations": {
    "OFFER_STATE_F": {
      "buckets": [
        {
          "key": "REJECTED",
          "doc_count": 777
        },
        {
          "key": "ACCEPTED",
          "doc_count": 52
        },
        {
          "key": "CANCELED",
          "doc_count": 22
        },
        {
          "key": "LONGLIST",
          "doc_count": 12
        },
        {
          "key": "WITHDRAWN",
          "doc_count": 12
        },
        {
          "key": "SHORTLIST",
          "doc_count": 5
        },
        {
          "key": "INTAKE",
          "doc_count": 0
        }
      ],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    }
  },
  "took": 4
}