我在每个商场都有每日统计记录,其字段如下:
有两个字段,我将使用bucket_script
来获得比率cpnTotalCount / orderTotalCount
,并使用bucket_sort
来获得 topK 。
但是,如果我只选择7天才能到达 topK 购物中心,由于doc_count_error_upper_bound
术语汇总中的文档计数(以及任何子汇总的结果)并不总是准确的。每个分片都提供自己的术语顺序列表视图。这些视图结合在一起可以得出最终视图。
是否有其他方法可以在“准确性”和“性能”之间实现更好的平衡。
任何帮助将不胜感激;)
{
"size": 10,
"query": {
"bool": {
"filter": [
{
"range": {
"orderTime": {
"from": 1589385600000,
"to": 1590249599999,
"include_lower": true,
"include_upper": true,
"boost": 1.0
}
}
},
{
"range": {
"cpnTotalCount": {
"from": 3,
"to": null,
"include_lower": true,
"include_upper": true,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggs": {
"es_aggs_bucketing": {
"terms": {
"field": "mallId",
"size": 20,
"shard_size": 10000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"es_aggs_count_one": {
"sum": {
"field": "cpnTotalCount"
}
},
"es_aggs_count_two": {
"sum": {
"field": "orderTotalCount"
}
},
"es_aggs_sum_one": {
"sum": {
"field": "cpnTotalAmount"
}
},
"es_aggs_script": {
"bucket_script": {
"buckets_path": {
"orderCount": "es_aggs_count_two",
"couponCount": "es_aggs_count_one"
},
"script": {
"source": "params.couponCount/params.orderCount",
"lang": "painless"
},
"gap_policy": "skip"
}
},
"sort": {
"bucket_sort": {
"sort": [
{
"es_aggs_script": {
"order": "desc"
}
}
],
"from": 0,
"size": 40,
"gap_policy": "SKIP"
}
}
}
}
}
}
答案 0 :(得分:0)
如果数据集不是很大,就我而言,它可能会在一年内达到150GB
,所以我正在尝试
30
分片来保存购物中心级别的记录和mallId
与routing
绑定以确保每个购物中心级别的比率都是准确的