我有一个弹性搜索请求如下:
{
"size":0,
"aggs":{
"group_by_state":{
"terms":{
"field":"poi_id"
},
"aggs":{
"sum(price)":{
"sum":{
"field":"price"
}
}
}
}
}
}
我想在此请求中添加分页,就像
一样select poi_id, sum(price) from table group by poi_id limit 0,2
我搜索了很多,并找到了相关链接:https://github.com/elastic/elasticsearch/issues/4915。
但我仍然没有得到实施方法。
有没有办法由Elasticsearch本身实现它,而不是我的应用程序?
答案 0 :(得分:2)
您可以在请求中使用参数from和size。有关详细信息,请参阅https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html。您的请求将是这样的:
{
"from" : 0,
"size" : 10,
"aggs":{
"group_by_state":{
"terms":{
"field":"poi_id"
},
"aggs":{
"sum(price)":{
"sum":{
"field":"price"
}
}
}
}
}
}
答案 1 :(得分:2)
我目前正在研究寻呼聚合结果的解决方案。您要使用的是partition
。官方文档中的这一部分非常有帮助。
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions
要调整您的示例,terms
设置将更新如下。
{
"size":0,
"aggs":{
"group_by_state":{
"terms":{
"field":"poi_id",
"include": {
"partition": 0,
"num_of_partitions": 100
},
"size": 10000
},
"aggs":{
"sum(price)":{
"sum":{
"field":"price"
}
}
}
}
}
}
这会将您的结果分组为100个分区(num_of_partitions
),每个分区的最大大小为10k(size
),并检索第一个此类分区(partition: 0
)< / p>
如果您要聚合的字段的唯一值超过10k(并希望返回所有值),则需要增加size
值或可能计算size
和{{1动态地根据你的领域的基数。
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#search-aggregations-metrics-cardinality-aggregation
您可能还想使用num_of_partitions
设置来确保您的聚合返回准确的计数。 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_per_bucket_document_count_error
希望这有帮助。
答案 2 :(得分:0)
迟到了聚会,但刚刚在v6.3 +中发现了'composite'个聚合。这些允许:
1.更像'Sql like'分组
2.使用“ after_key”进行分页。
拯救了我们的一天,希望它也能帮助其他人。
例如,获取2个日期之间每小时的点击数,分为5个字段:
GET myindex-idx/_search
{
"query": {
"bool": {
"must": [
{"match": {"docType": "myDOcType"}},
{"range": {
"@date": {"gte": "2019-06-19T21:00:00", "lt": "2019-06-19T22:00:00"}
}
}
]
}
},
"size": 0,
"aggs": {
"mybuckets": {
"composite": {
"size": 100,
"sources": [
{"@date": {
"date_histogram": {
"field": "@date",
"interval": "hour",
"format": "date_hour"}
}
},
{"field_1": {"terms": {"field": "field_1"}}},
{"field_2": {"terms": {"field": "field_2"}}},
{"field_3": {"terms": {"field": "field_3"}}},
{"field_4": {"terms": {"field": "field_4"}}},
{"field_5": {"terms": {"field": "field_5"}}}
]
}
}
}
}
产生:
{
"took": 255,
"timed_out": false,
"_shards": {
"total": 80,
"successful": 80,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 46989,
"max_score": 0,
"hits": []
},
"aggregations": {
"mybuckets": {
"after_key": {
"@date": "2019-06-19T21",
"field_1": 262,
"field_2": 347,
"field_3": 945,
"field_4": 2258,
"field_5": 0
},
"buckets": [
{
"key": {
"@date": "2019-06-19T21",
"field_1": 56,
"field_2": 106,
"field_3": 13224,
"field_4": 46239,
"field_5": 0
},
"doc_count": 3
},
{
"key": {
"@date": "2019-06-19T21",
"field_1": 56,
"field_2": 106,
"field_3": 32338,
"field_4": 76919,
"field_5": 0
},
"doc_count": 2
},
....
在这样的分页查询之后,在查询“ after”对象中使用“ after_key”对象:
GET myindex-idx/_search
{
"query": {
"bool": {
"must": [
{"match": {"docType": "myDOcType"}},
{"range": {
"@date": {"gte": "2019-06-19T21:00:00", "lt": "2019-06-19T22:00:00"}
}
}
]
}
},
"size": 0,
"aggs": {
"mybuckets": {
"composite": {
"size": 100,
"sources": [
{"@date": {
"date_histogram": {
"field": "@date",
"interval": "hour",
"format": "date_hour"}
}
},
{"field_1": {"terms": {"field": "field_1"}}},
{"field_2": {"terms": {"field": "field_2"}}},
{"field_3": {"terms": {"field": "field_3"}}},
{"field_4": {"terms": {"field": "field_4"}}},
{"field_5": {"terms": {"field": "field_5"}}}
],
"after": {
"@date": "2019-06-19T21",
"field_1": 262,
"field_2": 347,
"field_3": 945,
"field_4": 2258,
"field_5": 0
}
}
}
}
}
此页面浏览结果,直到mybuckets返回空