我正在尝试对称为“rating_average”的多值字段的平均值进行排序。在我给你的例子中,这个字段的值是[1,2,2]。我期望平均值为(1 + 2 + 2)/ 3 = 1.66666667。事实上,我平均得到1.5。
经过一些测试和分析扩展统计数据后,我发现这是因为平均值是针对所有非唯一项目计算的。因此统计运算符应用于集合[1,2]而不是[1,2,2]。我已经通过在我的查询中添加聚合部分来验证这一点,以便仔细检查排序块的平均值与统计数据聚合中的平均值相同。
示例文档如下:
{
"_source": {
"content_uri": "http://data.semint.co.uk/resource/testContent1",
"rating_average": [
"1",
"2",
"2"
],
"forDesk": "http://data.semint.co.uk/resource/kMFMJd1rtKD"
}
我正在执行的查询如下:
{
"from": 0,
"size": 20,
"aggs": {
"rating_stats": {
"extended_stats": {
"field": "rating_average"
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"terms": {
"mediaType": [
"http://data.semint.co.uk/resource/testMediaType3"
],
"execution": "and"
}
}
]
}
}
}
},
"fields": [ "content_uri", "rating_average"],
"sort": [
{
"rating_average": {
"order": "desc",
"mode": "avg"
}
}
]
}
这些是我通过上述文档执行查询得到的结果。
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "travel_content6",
"_type": "semantic-index",
"_id": "http://data.semint.co.uk/resource/testContent1",
"_score": null,
"fields": {
"content_uri": [
"http://data.semint.co.uk/resource/testContent1"
],
"rating_average": [1, 2, 2]
},
"sort": [
1.5
]
}
]
},
"aggregations": {
"rating_stats": {
"count": 2,
"min": 1,
"max": 2,
"avg": 1.5,
"sum": 3,
"sum_of_squares": 5,
"variance": 0.25,
"std_deviation": 0.5,
"std_deviation_bounds": {
"upper": 2.5,
"lower": 0.5
}
}
}
}