我想计算嵌套字段的中位数。嵌套字段包含具有某些属性的对象列表。在计算中位数之前,我想过滤掉其中一些。 例如,假设我在嵌套字段中有10个对象,但在计算中位数时将只使用10个对象中的7个。
query_median = {
"query": {
"bool": {
"filter": [
{
"term": {
"date": "2020-05-18"
}
},
{
"term": {
"group_name": "some_name"
}
}
]
}
},
"aggs": {
"median_value": {
"nested": {
"path": "people"
},
"aggs": {
"median": {
"percentiles": {
"field": "people.for_median_attr",
"percents": [50]
}
}
}
}
}
}
以上查询有效,但没有过滤器。当我想在aggs
中添加其他过滤器时,与没有任何过滤器的情况一样,它给了我同样的value
。下面是我尝试过的:
query_median = {
"query": {
"bool": {
"filter": [
{
"term": {
"date": "2020-05-18"
}
},
{
"term": {
"group_name": "some_name"
}
}
]
}
},
"aggs": {
"median_value": {
"nested": {
"path": "people"
},
"aggs": {
"filter_out": {
"filter": {
"bool": {
"must": [
{
"term": {
"people.attr_not_wanted1": False
},
"term": {
"people.attr_not_wanted2": False
}
}
]
}
},
"aggs": {
"median": {
"percentiles": {
"field": "people.for_median_attr",
"percents": [50]
}
}
}
}
}
}
}
}
示例文档:
{
"_index" : "some_index",
"_type" : "_doc",
"_id" : "some_id",
"_score" : 1.0,
"_source" : {
"date" : "2020-05-10",
"group_name" : "some_name",
"org_code" : "some_code",
"people" : [
{
"nickname" : "xxx",
"review_count" : 20.0,
"not_wanted_1" : false,
"not_wanted_2" : false
},
{
"nickname" : "yyy",
"review_count" : 18.0,
"not_wanted_1" : false,
"not_wanted_2" : false
},
{
"nickname" : "zzz",
"value_for_median" : 11.0,
"not_wanted_1" : true,
"not_wanted_2" : true
},
...
]
}
}
]
}
在这种情况下,中位数仅根据两个数字计算:20
和18
。
答案 0 :(得分:1)
您快到了。您只是在嵌套过滤器中缺少一些花括号,而应该选择true
而不是false
,因为您想保留嵌套文档来计算它们的中位数。
您的查询应如下所示:
{
"query": {
...
},
"aggs": {
"median_value": {
"nested": {
"path": "people"
},
"aggs": {
"filter_out": {
"filter": {
"bool": {
"must": [
{
"term": {
"people.not_wanted_1": true
}
},
{
"term": {
"people.not_wanted_2": true
}
}
]
}
},
"aggs": {
"median": {
"percentiles": {
"field": "people.value_for_median",
"percents": [
50
]
}
}
}
}
}
}
}
}
结果:
"aggregations" : {
"median_value" : {
"doc_count" : 3,
"filter_out" : {
"doc_count" : 1,
"median" : {
"values" : {
"50.0" : 11.0
}
}
}
}
}
答案 1 :(得分:1)
从https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html的文档中,您可以尝试将查询的“ filter_out”部分更新为:
"filter_out" : {
"filters" : {
"filters" : [
{ "term" : { "people.attr_not_wanted1" : false }},
{ "term" : { "people.attr_not_wanted2" : false }}
]
}
}