简短的问题
如何根据组合键的一部分对存储桶求和?
详细问题
几天来,我拥有属于某些零售商的某些商店的某些产品的库存。现在,我希望获得每个零售商的每种产品的平均库存,以汇总商店。
以下查询正在执行我想要的操作:
{
"size": 0,
"aggs": {
"docs": {
"composite": {
"sources": [
{"retailer": {"terms": {"field": "retailer"}}},
{"ean": {"terms": {"field": "ean"}}},
],
"size": 10000,
},
"aggs": {
"average_quantity_per_store": {
"terms": {"field": "store", "size": 1000.0},
"aggs": {
"average_quantity": {
"avg": {"field": "quantity"}
}
}
},
"average_quantity": {
"sum_bucket": {"buckets_path": "average_quantity_per_store>average_quantity"}
}
}
},
},
"query": filter_query
}
结果如下:
{
"took": 3885,
"timed_out": false,
"hits": {
"total": 137960,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"docs": {
"after_key": {
"retailer": "some_retailer",
"ean": "some_ean"
},
"buckets": [
{
"key": {
"retailer": "a_retailer",
"ean": "an_ean"
},
"doc_count": 29,
"average_quantity_per_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Store 1",
"doc_count": 5,
"average_quantity": {
"value": 2.0
}
},
{
"key": "Store 2",
"doc_count": 4,
"average_quantity": {
"value": 1.0
}
}
]
},
"average_quantity": {
"value": 3.0
}
}
]
}
}
}
我基本上只对这里的3.0感兴趣。花费的时间相当长(将近4秒)。我希望使用以下查询对此进行优化:
{
"size": 0,
"aggs": {
"docs": {
"composite": {
"sources": [
{"retailer": {"terms": {"field": "retailer"}}},
{"ean": {"terms": {"field": "ean"}}},
{"store": {"terms": {"field": "store"}}},
],
"size": 10000,
},
"aggs": {
"average_quantity": {
"avg": {"field": "quantity"}
}
}
},
},
"query": filter_query
}
如果我运行此命令,结果将是:
{
"took": 577,
"timed_out": false,
"hits": {
"total": 137960,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"docs": {
"after_key": {
"retailer": "some_retailer",
"ean": "some_ean",
"store": "some_store"
},
"buckets": [
{
"key": {
"retailer": "a_retailer",
"ean": "an_ean",
"store": "Store 1"
},
"doc_count": 5,
"average_quantity": {
"value": 2.0
}
},
{
"key": {
"retailer": "a_retailer",
"ean": "an_ean",
"store": "Store 2"
},
"doc_count": 4,
"average_quantity": {
"value": 1.0
}
}
]
}
}
}
在我看来,工作的主要部分是这样完成的。这样可以更快。但是,我现在需要的是总结各个商店中具有相同零售商和EAN组合的存储桶。
1)ElasticSearch中是否有一种方法可以根据组合键的各个部分来总结某些存储桶?
另一种方法是在首先调用ElasticSearch的Python应用程序中进行后处理。
2)会推荐吗?大量的数据传输会给我带来多少罚款?
注意:我正在使用ElasticSearch 6.4