综合集合与子集合

时间:2018-10-05 08:45:14

标签: elasticsearch query-performance composite-key elasticsearch-aggregation

简短的问题

如何根据组合键的一部分对存储桶求和?

详细问题

几天来,我拥有属于某些零售商的某些商店的某些产品的库存。现在,我希望获得每个零售商的每种产品的平均库存,以汇总商店。

以下查询正在执行我想要的操作:

{
    "size": 0,
    "aggs": {
        "docs": {
            "composite": {
                "sources": [
                    {"retailer": {"terms": {"field": "retailer"}}},
                    {"ean": {"terms": {"field": "ean"}}},
                ],
                "size": 10000,
            },
            "aggs": {
                "average_quantity_per_store": {
                    "terms": {"field": "store", "size": 1000.0},
                    "aggs": {
                        "average_quantity": {
                            "avg": {"field": "quantity"}
                        }
                    }
                },
                "average_quantity": {
                    "sum_bucket": {"buckets_path": "average_quantity_per_store>average_quantity"}
                }
            }
        },
    },
    "query": filter_query
}

结果如下:

{
    "took": 3885,
    "timed_out": false,
    "hits": {
        "total": 137960,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "docs": {
            "after_key": {
                "retailer": "some_retailer",
                "ean": "some_ean"
            },
            "buckets": [
                {
                    "key": {
                        "retailer": "a_retailer",
                        "ean": "an_ean"
                    },
                    "doc_count": 29,
                    "average_quantity_per_store": {
                        "doc_count_error_upper_bound": 0,
                        "sum_other_doc_count": 0,
                        "buckets": [
                            {
                                "key": "Store 1",
                                "doc_count": 5,
                                "average_quantity": {
                                    "value": 2.0
                                }
                            },
                            {
                                "key": "Store 2",
                                "doc_count": 4,
                                "average_quantity": {
                                    "value": 1.0
                                }
                            }
                        ]
                    },
                    "average_quantity": {
                        "value": 3.0
                    }
                }
            ]
        }
    }
}

我基本上只对这里的3.0感兴趣。花费的时间相当长(将近4秒)。我希望使用以下查询对此进行优化:

{
    "size": 0,
    "aggs": {
        "docs": {
            "composite": {
                "sources": [
                    {"retailer": {"terms": {"field": "retailer"}}},
                    {"ean": {"terms": {"field": "ean"}}},
                    {"store": {"terms": {"field": "store"}}},
                ],
                "size": 10000,
            },
            "aggs": {
                "average_quantity": {
                    "avg": {"field": "quantity"}
                }
            }
        },
    },
    "query": filter_query
}

如果我运行此命令,结果将是:

{
    "took": 577,
    "timed_out": false,
    "hits": {
        "total": 137960,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "docs": {
            "after_key": {
                "retailer": "some_retailer",
                "ean": "some_ean",
                "store": "some_store"
            },
            "buckets": [
                {
                    "key": {
                        "retailer": "a_retailer",
                        "ean": "an_ean",
                        "store": "Store 1"
                    },
                    "doc_count": 5,
                    "average_quantity": {
                        "value": 2.0
                    }
                },
                {
                    "key": {
                        "retailer": "a_retailer",
                        "ean": "an_ean",
                        "store": "Store 2"
                    },
                    "doc_count": 4,
                    "average_quantity": {
                        "value": 1.0
                    }
                }
            ]
        }
    }
}

在我看来,工作的主要部分是这样完成的。这样可以更快。但是,我现在需要的是总结各个商店中具有相同零售商和EAN组合的存储桶。

1)ElasticSearch中是否有一种方法可以根据组合键的各个部分来总结某些存储桶?

另一种方法是在首先调用ElasticSearch的Python应用程序中进行后处理。

2)会推荐吗?大量的数据传输会给我带来多少罚款?

注意:我正在使用ElasticSearch 6.4

0 个答案:

没有答案