我正在对Elasticsearch中的索引products
进行一些查询。我在索引products
{ "product_name": "prod-1", "meta": [ { "key": "key1", "value": "value1" }, { "key": "key2", "value": "value2" } ] }
{ "product_name": "prod-2", "meta": [ { "key": "key1", "value": "value1" } ] }
{ "product_name": "prod-2", "meta": [ { "key": "key2", "value": "value2" } ] }
{ "product_name": "prod-3", "meta": [ { "key": "key2", "value": "value2" } ] }
我现在想要得到的是在元数组中同时包含product_name
和key1/value1
但不一定在同一文档中的key2/value2
。例如,在上面的数据中,prod-1
在同一文档中同时具有key1/value1
和key2/value2
,因此我想要结果prod-1
。并且prod-2
同时具有key1/value1
和key2/value2
,但它们在不同的文档中。我也想在结果中使用prod-2
。 prod-3
仅包含key2/value2
,即使在整个文档中也是如此。因此,我不想在结果中得到prod-3
。
我正在尝试以下方法
key1/value1
和key2/value2
我按product_name
对它们进行分组,并按如下方式组合每个存储分区中的元字段
{
"size": 0,
"aggs": {
"by_product": {
"terms": {
"field": "product_name"
},
"aggs": {
"all_meta": {
"top_hits": {
"_source": {
"includes": [
"meta.key",
"meta.value"
]
}
}
}
}
}
}
}
上述汇总后的结果实际上是以下情况
"aggregations" : {
"by_product" : {
...
"buckets" : [
{
...
"key" : "prod-2",
"all_meta" : {
"hits" : {
...
"hits" : [
{
....
"_source" : {
"meta" : [
{
"value" : "value1",
"key" : "key1"
}
]
}
},
{
....
"_source" : {
"meta" : [
{
"value" : "value2",
"key" : "key2"
}
]
}
}
]
}
}
},
{
....
"key" : "prod-1",
"all_meta" : {
"hits" : {
....
"hits" : [
{
....
"_source" : {
"meta" : [
{
"value" : "value1",
"key" : "key1"
},
{
"value" : "value2",
"key" : "key2"
}
]
}
}
]
}
}
},
{
....
"key" : "prod-3",
"all_meta" : {
"hits" : {
....
"hits" : [
{
....
"_source" : {
"meta" : [
{
"value" : "value2",
"key" : "key2"
}
]
}
}
]
}
}
}
]
}
}
现在,我想仅从每个聚集中同时包含{ "key": "key1", "value": "value1" }
和{ "key": "key2", "value": "value2" }
的每个存储桶中过滤值,并获取存储桶。像这样
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "buckets.all_meta.hits.hits._source.meta",
"query": {
"bool": {
"must": [
{
"match": {
"buckets.all_meta.hits.hits._source.meta.key": "key1"
}
},
{
"match": {
"buckets.all_meta.hits.hits._source.meta.value": "value1"
}
}
]
}
}
}
},
{
"nested": {
"path": "buckets.all_meta.hits.hits._source.meta",
"query": {
"bool": {
"must": [
{
"match": {
"buckets.all_meta.hits.hits._source.meta.key": "key2"
}
},
{
"match": {
"buckets.all_meta.hits.hits._source.meta.value": "value2"
}
}
]
}
}
}
}
]
}
}
}
但是我不确定如何执行上述步骤。是否有可能做到这一点? This stackoverflow问题与此类似,但没有任何答案。还有其他方法可以得到我想要的结果吗?任何帮助,将不胜感激。谢谢。
答案 0 :(得分:1)
这是一个解决方案。这个想法是,在每个产品存储区中,我们聚合所有键/值对(使用脚本化的terms
聚合),然后使用bucket_selector
管道聚合,我们仅选择具有两个不同的产品存储区对。
POST products/_search
{
"size": 0,
"aggs": {
"by_product": {
"terms": {
"field": "product_name.keyword"
},
"aggs": {
"meta": {
"nested": {
"path": "meta"
},
"aggs": {
"kv": {
"terms": {
"script": """
[doc['meta.key.keyword'].value, doc['meta.value.keyword'].value].join('-')
""",
"size": 10
}
}
}
},
"selector": {
"bucket_selector": {
"buckets_path": {
"count": "meta>kv._bucket_count"
},
"script": "params.count == 2"
}
}
}
}
}
}
在结果中,您可以看到我们只有prod-1
和prod-2`:
"buckets" : [
{
"key" : "prod-2",
"doc_count" : 2,
"meta" : {
"doc_count" : 2,
"kv" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "key1-value1",
"doc_count" : 1
},
{
"key" : "key2-value2",
"doc_count" : 1
}
]
}
}
},
{
"key" : "prod-1",
"doc_count" : 1,
"meta" : {
"doc_count" : 2,
"kv" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "key1-value1",
"doc_count" : 1
},
{
"key" : "key2-value2",
"doc_count" : 1
}
]
}
}
}
]