是否有某种方法可以使嵌套字段上的统计数据夸张,因此我只考虑最大数量的嵌套字段特定值用于统计评估。
映射:
{
"mappings": {
"doc": {
"properties": {
"student_id": {
"type": "long"
},
"test_scores": {
"type": "nested",
"properties": {
"test_id": {
"type": "long"
},
"score": {
"type": "double"
}
}
}
}
}
}
}
样本数据:
{
"student_id": 1,
"test_scores": [
{
"test_id": 101,
"score": 90
},
{
"test_id": 102,
"score": 70
},
{
"test_id": 103,
"score": 80
}
]
}
{
"student_id": 2,
"test_scores": [
{
"test_id": 101,
"score": 80
},
{
"test_id": 102,
"score": 90
},
{
"test_id": 103,
"score": 85
}
]
}
{
"student_id": 3,
"test_scores": [
{
"test_id": 101,
"score": 30
},
{
"test_id": 102,
"score": 40
},
{
"test_id": 103,
"score": 55
}
]
}
过滤查询:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"student_id": 1
}
},
{
"nested": {
"path": "test_scores",
"query": {
"terms": {
"test_scores.test_id": [101]
}
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"student_id": 2
}
},
{
"nested": {
"path": "test_scores",
"query": {
"terms": {
"test_scores.test_id": [101, 103]
}
}
}
}
]
}
}
]
}
}
}
要求:
我需要基于aboe过滤查询在test_scores.score上为学生找到min和max(统计汇总),这样我只考虑每个student_id的最大test_scores.score。
从上述查询中过滤出的文档中,
doc:
student_id: 1
test_scores.test_id: 101
test_scores.score: 90
test_scores.score (To be considered for aggregation): 90
doc:
student_id: 2
test_scores.test_id: 101, 103
test_scores.score: 80, 85
test_scores.score (To be considered for aggregation): 85
Expected overall stats on test_scores.score:
max: 90
min: 85
在网上搜索后,我找到了解决方案:
{
"aggs": {
"score_stats": {
"stats": {
"script": "if(doc[\"student_id\"].value == 1){
return params._source[\"test_scores\"]
.stream()
.filter(nested -> nested.test_id == 101)
.mapToDouble(nested -> nested.score)
.max()
.orElse(0)
} else if(doc[\"student_id\"].value == 2){
return params._source[\"test_scores\"]
.stream()
.filter(nested ->
nested.test_id == 101 || nested.test_id == 103)
.mapToDouble(nested -> nested.score)
.max()
.orElse(0)
} else {
return 0
}"
}
}
},
"query": {
//filtering query copied here
}
}
}
回复:
"aggregations" : {
"score_stats" : {
"count" : 2,
"min" : 85.0,
"max" : 90.0,
"avg" : 87.5,
"sum" : 175.0
}
}
尽管此解决方案适用于上述简单查询。我真正的查询可能非常复杂。此方法不可扩展,因为脚本长度有上限。
我尝试使用过滤聚合对嵌套聚合进行测试,但是进入嵌套路径后,似乎无法对非嵌套字段执行AND / OR。
是否有更好的方法可以使嵌套字段上的统计数据夸张,因此我只考虑对嵌套字段的特定最大值进行统计评估。