我有像这样的弹性搜索索引的数据,这是我期望数据在sku_id上分组的输出,我需要整个日期范围的平均排名,并且在日期范围内,last_7days_avg_rank和last的第一个值last_7days_avg_rank的值将日期作为2个单独的字段,如下所示
如果弹性搜索中有可能,有人可以告诉我吗?现在我正在服务层进行这种计算,但由于响应时间已经成为UN可接受的,我想将这个逻辑移到ES本身,但是无法弄清楚如何实现这一点?
输入:
date sku_id last_7days_avg_rank rank
20180101 S1 200 200
20180102 S1 210 200
20180105 S1 220 200
20180108 S1 230 200
20180101 S2 180 300
20180103 S2 200 300
20180106 S2 250 300
20180107 S2 300 300
预期产出:
sku first_val_last7day_avg last_val_last7days_avg avg(rank)
S1 200 230 200
S2 180 300 300
谢谢!
答案 0 :(得分:5)
您可以使用聚合
获得所需的结果{
"size": 0,
"aggs": {
"GROUP": {
"terms": {
"field": "sku_id"
},
"aggs": {
"AVG_RANK": {
"avg": {
"field": "rank"
}
},
"FIRST_7_RANK": {
"top_hits": {
"size": 1,
"sort": [
{
"my_date": {
"order": "asc"
}
}
]
}
},
"LAST_7_RANK": {
"top_hits": {
"size": 1,
"sort": [
{
"my_date": {
"order": "desc"
}
}
]
}
}
}
}
}
}
您可以获得以下结果作为输出:
"aggregations": {
"GROUP": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "S1",
"doc_count": 40,
"LAST_7_RANK": {
"hits": {
"total": 40,
"max_score": null,
"hits": [
{
"_index": "index_name",
"_type": "type_name",
"_id": "AWI9MU6JeKRzn3ttxGOr",
"_score": null,
"_source": {
"my_date": "2018-01-08",
"sku_id": "S1",
"last_7days_avg_rank": 230,
"rank": 200
},
"sort": [
1515369600000
]
}
]
}
},
"AVG_RANK": {
"value": 200
},
"FIRST_7_RANK": {
"hits": {
"total": 40,
"max_score": null,
"hits": [
{
"_index": "index_name",
"_type": "type_name",
"_id": "AWI9LYVpeKRzn3ttxGOQ",
"_score": null,
"_source": {
"my_date": "20180101",
"sku_id": "S1",
"last_7days_avg_rank": 200,
"rank": 200
},
"sort": [
20180101
]
}
]
}
}
},
{
"key": "S2",
"doc_count": 40,
"LAST_7_RANK": {
"hits": {
"total": 40,
"max_score": null,
"hits": [
{
"_index": "index_name",
"_type": "type_name",
"_id": "AWI9MU6JeKRzn3ttxGOv",
"_score": null,
"_source": {
"my_date": "2018-01-07",
"sku_id": "S2",
"last_7days_avg_rank": 300,
"rank": 300
},
"sort": [
1515283200000
]
}
]
}
},
"AVG_RANK": {
"value": 300
},
"FIRST_7_RANK": {
"hits": {
"total": 40,
"max_score": null,
"hits": [
{
"_index": "index_name",
"_type": "type_name",
"_id": "AWI9LYVpeKRzn3ttxGOU",
"_score": null,
"_source": {
"my_date": "20180101",
"sku_id": "S2",
"last_7days_avg_rank": 180,
"rank": 300
},
"sort": [
20180101
]
}
]
}
}
}
]
}
}
以上结果为S1和S2创建了两个存储桶(组)。并且在每个桶中,您可以在AVG_RANK字段中获得该组的平均排名,对于 first_val_last7day_avg ,您需要跟踪“FIRST_7_RANK”的值 - > “hits” - >“hits” - >“_ source” - >“rank”,类似地,对于 last_val_last7days_avg ,您需要恍惚“LAST_7_RANK” - > “命中” - > “中命中” - > “中_源” - > “中等级” 我希望这可以帮到你