我想在存储桶中汇总或计算结果。 例如:
{
ID: 1,
customer_name: a,
age: 21,
other_field: x
},
{
ID: 2,
customer_name: a,
age: 25,
other_field: x
}
{
ID: 3,
customer_name: a,
age: 32,
other_field: x
}
{
ID: 4,
customer_name: b,
age: 24,
other_field: x
}
{
ID: 5,
customer_name: b,
age: 33,
other_field: x
}
{
ID: 6,
customer_name: b,
age: 17,
other_field: y
},
{
ID: 7,
customer_name: c,
age: 34,
other_field: x
},
{
ID: 8,
customer_name: c,
age: 26,
other_field: y
}
我的查询是:
"query": {
"bool": {
"must": { "match": { "other_field": "x" }},
}
}
命中文档的ID为[1,2,3,4,5,7]
我要做的是找出每个客户最年轻的热门文档
我的汇总查询是
"aggs": {
"distinct_user": {
"terms": {
"field": "customer_name",
"size": 100
},
"aggs": {
"youngest": {
"min": {
"field": "AGE"
}
}
}
}
}
bucket: [
{
"key": "a",
"doc_count": 3,
"youngest": {
"value": 21
}
},
{
"key": "b",
"doc_count": 2,
"youngest": {
"value": 24
}
},
{
"key": "c",
"doc_count": 1,
"youngest": {
"value": 34
}
}
]
比使用范围汇总来计算年龄分布
21〜30:2 31〜40:1
是否有任何方法可以汇总存储桶结果? 还是可以解决?
答案 0 :(得分:0)
一种实现方法是利用bucket_selector
和stats_bucket
管道聚合。您可以根据需要添加任意数量的年龄组。我刚刚添加了两个相关的内容,以展示一种解决方案:
POST test/_search
{
"size": 0,
"query": {
"bool": {
"must": {
"match": {
"other_field": "x"
}
}
}
},
"aggs": {
"customers_20_30": {
"terms": {
"field": "customer_name",
"size": 100
},
"aggs": {
"youngest": {
"min": {
"field": "age"
}
},
"20-30": {
"bucket_selector": {
"buckets_path": {
"youngest": "youngest"
},
"script": "params.youngest >= 20 && params.youngest < 30"
}
}
}
},
"customers_20_30_count": {
"stats_bucket": {
"buckets_path": "customers_20_30._count"
}
},
"customers_30_40": {
"terms": {
"field": "customer_name.keyword",
"size": 100
},
"aggs": {
"youngest": {
"min": {
"field": "age"
}
},
"30-40": {
"bucket_selector": {
"buckets_path": {
"youngest": "youngest"
},
"script": "params.youngest >= 30 && params.youngest < 40"
}
}
}
},
"customers_30_40_count": {
"stats_bucket": {
"buckets_path": "customers_30_40._count"
}
}
}
}
在结果中,您将获得:
"customers_20_30_count" : {
"count" : 2, <--- 2 buckets for 20-30
"min" : 2.0,
"max" : 3.0,
"avg" : 2.5,
"sum" : 5.0
},
"customers_30_40_count" : {
"count" : 1, <--- 1 bucket for 30-40
"min" : 1.0,
"max" : 1.0,
"avg" : 1.0,
"sum" : 1.0
}