我是ElasticSearch的新手,需要帮助解决以下问题:
我有一组包含多个产品的文档。我想通过“Apple”过滤产品属性product_brand
,并获得与过滤器匹配的产品数量。但结果应按文档ID分组,文档ID也是文档本身的一部分(test_id
)。
示例文件:
"test" : {
"test_id" : 19988,
"test_name" : "Test",
},
"products" : [
{
"product_id" : 1,
"product_brand" : "Apple"
},
{
"product_id" : 2,
"product_brand" : "Apple"
},
{
"product_id" : 3,
"product_brand" : "Samsung"
}
]
结果应为:
{
"key" : 19988,
"count" : 2
},
在SQL中它看起来大致如下:
SELECT test_id, COUNT(product_id)
FROM `test`
WHERE product_brand = 'Apple'
GROUP BY test_id;
我怎样才能做到这一点?
答案 0 :(得分:1)
我认为这应该让你非常接近:
GET /test/_search
{
"_source": {
"includes": [
"test.test_id",
"_score"
]
},
"query": {
"function_score": {
"query": {
"match": {
"products.product_brand.keyword": "Apple"
}
},
"functions": [
{
"script_score": {
"script": {
"source": "def matches=0; def products = params['_source']['products']; for(p in products){if(p.product_brand == params['brand']){matches++;}} return matches;",
"params": {
"brand": "Apple"
}
}
}
}
]
}
}
}
此方法使用function_score,但如果您想要以不同方式得分,您也可以将其应用于脚本字段。以上内容仅匹配具有子产品对象且品牌文本完全设置为“Apple”的文档。
您只需要控制两个apple实例的输入。或者,您可以匹配function_score查询中的所有内容,并仅关注分数。您的输出可能如下所示:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 2,
"hits": [
{
"_index": "test",
"_type": "doc",
"_id": "AV99vrBpgkgblFY6zscA",
"_score": 2,
"_source": {
"test": {
"test_id": 19988
}
}
}
]
}
}
我使用的索引中的映射看起来像这样:
{
"test": {
"mappings": {
"doc": {
"properties": {
"products": {
"properties": {
"product_brand": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"product_id": {
"type": "long"
}
}
},
"test": {
"properties": {
"test_id": {
"type": "long"
},
"test_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}