如何在Elasticsearch的聚合中添加分页?

时间:2017-11-19 17:45:45

标签: elasticsearch paging elasticsearch-aggregation

我有一个弹性搜索请求如下:

{
    "size":0,
    "aggs":{
        "group_by_state":{
            "terms":{
                "field":"poi_id"
            },
            "aggs":{
                "sum(price)":{
                    "sum":{
                        "field":"price"
                    }
                }
            }
        }
    }
}

我想在此请求中添加分页,就像

一样
select poi_id, sum(price) from table group by poi_id limit 0,2

我搜索了很多,并找到了相关链接:https://github.com/elastic/elasticsearch/issues/4915

但我仍然没有得到实施方法。

有没有办法由Elasticsearch本身实现它,而不是我的应用程序?

3 个答案:

答案 0 :(得分:2)

您可以在请求中使用参数from和size。有关详细信息,请参阅https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html。您的请求将是这样的:

{
    "from" : 0, 
    "size" : 10,
    "aggs":{
        "group_by_state":{
            "terms":{
                "field":"poi_id"
            },
            "aggs":{
                "sum(price)":{
                    "sum":{
                        "field":"price"
                    }
                }
            }
        }
    }
}

答案 1 :(得分:2)

我目前正在研究寻呼聚合结果的解决方案。您要使用的是partition。官方文档中的这一部分非常有帮助。 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions

要调整您的示例,terms设置将更新如下。

{
    "size":0,
    "aggs":{
        "group_by_state":{
            "terms":{
                "field":"poi_id",
                "include": {
                    "partition": 0,
                    "num_of_partitions": 100
                },
                "size": 10000
            },
            "aggs":{
                "sum(price)":{
                    "sum":{
                        "field":"price"
                    }
                }
            }
        }
    }
}

这会将您的结果分组为100个分区(num_of_partitions),每个分区的最大大小为10k(size),并检索第一个此类分区(partition: 0)< / p>

如果您要聚合的字段的唯一值超过10k(并希望返回所有值),则需要增加size值或可能计算size和{{1动态地根据你的领域的基数。 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#search-aggregations-metrics-cardinality-aggregation

您可能还想使用num_of_partitions设置来确保您的聚合返回准确的计数。 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_per_bucket_document_count_error

希望这有帮助。

答案 2 :(得分:0)

迟到了聚会,但刚刚在v6.3 +中发现了'composite'个聚合。这些允许:
1.更像'Sql like'分组
2.使用“ after_key”进行分页。
拯救了我们的一天,希望它也能帮助其他人。

例如,获取2个日期之间每小时的点击数,分为5个字段:

GET myindex-idx/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"docType": "myDOcType"}}, 
        {"range": {
          "@date": {"gte": "2019-06-19T21:00:00", "lt": "2019-06-19T22:00:00"}
          }
        }
      ]
    }
  }, 
  "size": 0, 
  "aggs": {
    "mybuckets": {
      "composite": {
        "size": 100, 
        "sources": [
          {"@date": {
            "date_histogram": {
              "field": "@date", 
              "interval": "hour", 
              "format": "date_hour"}
            }
          }, 
          {"field_1": {"terms": {"field": "field_1"}}}, 
          {"field_2": {"terms": {"field": "field_2"}}}, 
          {"field_3": {"terms": {"field": "field_3"}}}, 
          {"field_4": {"terms": {"field": "field_4"}}}, 
          {"field_5": {"terms": {"field": "field_5"}}}
        ]
      }
    }
  }
}

产生:

{
  "took": 255,
  "timed_out": false,
  "_shards": {
    "total": 80,
    "successful": 80,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 46989,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "mybuckets": {
      "after_key": {
        "@date": "2019-06-19T21",
        "field_1": 262,
        "field_2": 347,
        "field_3": 945,
        "field_4": 2258,
        "field_5": 0
      },
      "buckets": [
        {
          "key": {
            "@date": "2019-06-19T21",
            "field_1": 56,
            "field_2": 106,
            "field_3": 13224,
            "field_4": 46239,
            "field_5": 0
          },
          "doc_count": 3
        },
        {
          "key": {
            "@date": "2019-06-19T21",
            "field_1": 56,
            "field_2": 106,
            "field_3": 32338,
            "field_4": 76919,
            "field_5": 0
          },
          "doc_count": 2
        },
        ....

在这样的分页查询之后,在查询“ after”对象中使用“ after_key”对象:

GET myindex-idx/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"docType": "myDOcType"}}, 
        {"range": {
          "@date": {"gte": "2019-06-19T21:00:00", "lt": "2019-06-19T22:00:00"}
          }
        }
      ]
    }
  }, 
  "size": 0, 
  "aggs": {
    "mybuckets": {
      "composite": {
        "size": 100, 
        "sources": [
          {"@date": {
            "date_histogram": {
              "field": "@date", 
              "interval": "hour", 
              "format": "date_hour"}
            }
          }, 
          {"field_1": {"terms": {"field": "field_1"}}}, 
          {"field_2": {"terms": {"field": "field_2"}}}, 
          {"field_3": {"terms": {"field": "field_3"}}}, 
          {"field_4": {"terms": {"field": "field_4"}}}, 
          {"field_5": {"terms": {"field": "field_5"}}}
        ],
      "after": {
        "@date": "2019-06-19T21",
        "field_1": 262,
        "field_2": 347,
        "field_3": 945,
        "field_4": 2258,
        "field_5": 0
        }
      }
    }
  }
}

此页面浏览结果,直到mybuckets返回空