我有一个产品列表,每个产品都有报价列表(嵌套列表)。产品有一个价格(参考价格),每个报价都有一个价格“ offers.price”,这是经销商(“ offers.shop”)的价格。
我需要汇总每个商店有相等,更低和较高的报价,以及每个商店有多少总报价。 (以及查询中的其他一些统计信息)。该产品具有其他属性,例如品牌,类别等。
我有以下映射:
{
"mappings": {
"product": {
"properties": {
"user_id": {
"type": "integer"
},
"fid": {
"type": "keyword"
},
"name": {
"type": "keyword",
"fields": {
"text": {
"type": "text"
}
}
},
"brand": {
"type": "keyword"
},
"ean": {
"type": "keyword"
},
"price": {
"type": "scaled_float",
"scaling_factor": 100
},
"offers": {
"type": "nested",
"properties": {
"product_fid": {
"type": "keyword"
},
"shop": {
"type": "keyword"
},
"product_price": {
"type": "scaled_float",
"scaling_factor": 100
},
"price": {
"type": "scaled_float",
"scaling_factor": 100
},
"source": {
"type": "keyword"
}
}
}
}
}
}
}
这是我的查询:
基本上,我先创建aggr offers
(嵌套),然后创建by_shop
(术语)以获取带有“ offers.shop”的存储桶,然后创建另一个术语聚合by_product_fid
,因为我只需要接受一个要约每个“ product_fid”(价格最低的商品)。
{
"query": {
"bool": {
"must": [],
"filter": [
{
"term": {
"user_id": 1
}
}
]
}
},
"aggregations": {
"offers": {
"nested": {
"path": "offers"
},
"aggregations": {
"filtered": {
"filter": {
"bool": {
"filter": []
}
},
"aggregations": {
"by_shop": {
"terms": {
"field": "offers.shop",
"order": {
"_term": "asc"
},
"size": 10000
},
"aggregations": {
"by_product_fid": {
"terms": {
"field": "offers.product_fid",
"size": 10000
},
"aggregations": {
"lowest_price": {
"min": {
"field": "offers.price"
}
},
"product_price": {
"min": {
"field": "offers.product_price"
}
},
"an_offer": {
"bucket_script": {
"buckets_path": {},
"script": "1"
}
},
"lower_offer": {
"bucket_script": {
"buckets_path": {
"lowest_price": "lowest_price",
"product_price": "product_price"
},
"script": "params.product_price > params.lowest_price ? 1 : 0"
}
},
"equal_offer": {
"bucket_script": {
"buckets_path": {
"lowest_price": "lowest_price",
"product_price": "product_price"
},
"script": "params.product_price == params.lowest_price ? 1 : 0"
}
},
"higher_offer": {
"bucket_script": {
"buckets_path": {
"lowest_price": "lowest_price",
"product_price": "product_price"
},
"script": "params.product_price < params.lowest_price ? 1 : 0"
}
}
}
},
"common_offers_count": {
"sum_bucket": {
"buckets_path": "by_product_fid>an_offer"
}
},
"lower_offers_count": {
"sum_bucket": {
"buckets_path": "by_product_fid>lower_offer"
}
},
"equal_offers_count": {
"sum_bucket": {
"buckets_path": "by_product_fid>equal_offer"
}
},
"higher_offers_count": {
"sum_bucket": {
"buckets_path": "by_product_fid>higher_offer"
}
},
"sorted": {
"bucket_sort": {
"gap_policy": "insert_zeros",
"from": 0,
"size": 50
}
}
}
},
"shop_count": {
"cardinality": {
"field": "offers.shop"
}
}
}
}
}
}
}
}
此查询速度很慢-大约4秒。如果用户有1000个产品,每个产品约有30个报价。所有用户共有6000个产品。每位用户生产的产品将有10万种产品。我认为第二个术语聚合by_product_fid
大大降低了它的速度,但是是否还有其他方法可以获取相同offers.product_fid
的不同记录并获得最低offers.price
的记录呢?
字段offers.product_fid
是产品中fid
的副本。
有什么方法可以调查Elasticsearch的内部知识,以了解处理查询的大部分时间吗?
或者也许我应该设计不同的映射?我将不胜感激任何建议...