我目前有一个跨一年的产品的Elasticsearch索引,每个索引按月分隔(我认为,如果我没有我想的那样多的数据,可能是按年份)。每天都有一个过程获取这些产品的所有价格,并将其投入弹性搜索。我正在尝试建立一个查询,该查询可以为我提供每种产品在过去30天内的变化百分比。
示例...
{
"timestamp": "2019-09-18T02:38:51.417Z",
"productId": 1,
"marketPrice": 5.00,
"lowPrice": 4.30
},
{
"timestamp": "2019-08-30T02:38:51.417Z", (THIS SHOULD BE IGNORED)**
"productId": 1,
"marketPrice": 100.00,
"lowPrice": 200.15
},
{
"timestamp": "2019-08-18T02:38:51.417Z",
"productId": 1,
"marketPrice": 10.00,
"lowPrice": 2.15
},
{
"timestamp": "2019-09-18T02:38:51.417Z",
"productId": 2,
"marketPrice": 2.00,
"lowPrice": 1.00
},
{
"timestamp": "2019-08-18T02:38:51.417Z",
"productId": 2,
"marketPrice": 3.00,
"lowPrice": 2.00
}
结果示例
{
"productId": 1,
"marketPriceChangeWithin30Days": 200%,
"lowPriceChangeWithin30Days": 200%
},
{
"productId": 2,
"marketPriceChangeWithin30Days": 150%,
"lowPriceChangeWithin30Days": 200%
}
**(应该忽略)是因为应该比较的两个值是最近的时间戳记和过去30天左右的最接近的时间戳记。
然后查询将返回产品ID 1和2,结果的百分比已更改,如示例响应所示。
答案 0 :(得分:1)
您可以利用derivative
pipeline aggregation来完全达到您的期望:
POST /sales/_search
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "timestamp",
"interval": "month"
},
"aggs": {
"marketPrice": {
"sum": {
"field": "marketPrice"
}
},
"lowPrice": {
"sum": {
"field": "lowPrice"
}
},
"marketPriceDiff": {
"derivative": {
"buckets_path": "marketPrice"
}
},
"lowPriceDiff": {
"derivative": {
"buckets_path": "lowPrice"
}
}
}
}
}
}
更新:
鉴于您的最新要求,我建议您使用serial_diff
pipeline aggregation,间隔30天:
POST /sales/_search
{
"size": 0,
"query": {
"range": {
"timestamp": {
"gte": "now-31d",
"lte": "now"
}
}
},
"aggs": {
"products": {
"terms": {
"field": "productId",
"size": 10
},
"aggs": {
"histo": {
"date_histogram": {
"field": "timestamp",
"interval": "day",
"min_doc_count": 0
},
"aggs": {
"marketPrice": {
"avg": {
"field": "marketPrice"
}
},
"lowPrice": {
"avg": {
"field": "lowPrice"
}
},
"30d_diff_marketPrice": {
"serial_diff": {
"buckets_path": "marketPrice",
"lag": 30
}
},
"30d_diff_lowPrice": {
"serial_diff": {
"buckets_path": "lowPrice",
"lag": 30
}
}
}
}
}
}
}
}