Question

使用一个简单但有点人为的例子，让我们说我在ElasticSearch中存储了几个库存文档，其中每个文档代表购买或销售商品：

[
{item_id: "foobar", type: "cost", value: 12.34, timestamp:149382734621},
{item_id: "bizbaz", type: "sale", value: 45.12, timestamp:149383464621},
{item_id: "foobar", type: "sale", value: 32.74, timestamp:149384824621},
{item_id: "foobar", type: "cost", value: 12.34, timestamp:149387435621},
{item_id: "bizbaz", type: "sale", value: 45.12, timestamp:149388434621},
{item_id: "bizbaz", type: "cost", value: 41.23, timestamp:149389424621},
{item_id: "foobar", type: "sale", value: 32.74, timestamp:149389914621},
{item_id: "waahoo", type: "sale", value: 11.23, timestamp:149389914621},
...
]

对于指定的时间范围，我想计算每个项目的当前利润。例如，我想返回：

foobar_profit = sum(value of all documents item_id="foobar" and type="sale")
               -sum(value of all documents item_id="foobar" and type="cost")
bizbaz_profit = sum(value of all documents item_id="bizbaz" and type="sale")
               -sum(value of all documents item_id="bizbaz" and type="cost")
...

有两个方面我还不了解如何实现。

我知道如何aggregate over terms，所以这将允许我总结所有＆＃34; foobar＆＃34;的价值。物品，不论类型。 但我不知道如何在两个字段上对所有匹配的文档求和。例如，我想在复合键(item_id,type)上汇总上述数据集。然后，上面的数据集将产生聚合：
- （foobar的，成本） - ＆GT; 24.68
- （foobar的，销售） - ＆GT; 65.48
- （bizbaz，成本） - ＆GT; 41.23
- （bizbaz，销售） - ＆GT; 90.24
- （waahoo，销售） - ＆GT; 11.23
假设我可以做＃1，我会有foobar_cost和foobar_sale等聚合。但我不知道如何组合两个聚合，以便在这种情况下foobar_profit = foobar_sale - foobar_cost。所以上面的聚合将成为
- foobar_profit-＆GT; 40.8
- bizbaz_profit-＆GT; 49.01
- waahoo_profit-＆GT; 11.23

最后的一些说明：

在上面的示例中，我只列出了3个item_id，但考虑到将有数千个item_id，因此我无法对每个item_id执行特殊情况查询。
此外，对于特定商品，cost和sale商品会在不同时间进入，因此我们无法将成本和促销价格放在同一个商品中字段。
我可以发回所有数据并执行聚合客户端的最后一步，但这可能是大量数据。实际上，我需要在服务器端执行此操作，以便我可以按利润对结果进行排序并返回前N个。

Answer 1

您可以使用嵌套聚合。请参阅此处查看工作示例：https://gist.github.com/mattweber/71033b1bf2ebed1afd8e

我在此示例中使用MatchAll查询，但您可以使用RangeQuery或您需要的任何内容替换它。

ElasticSearch Aggregations：根据匹配减去聚合

1 个答案: