如何在弹性搜索中聚合数组中匹配的字段

时间:2015-05-05 14:36:09

标签: elasticsearch

我有一个名为properties的数组的对象。属性本身就是对象,由fields属性和值组成(以及其他几个在这里不重要的对象)。

我想找到某个属性的所有值。

我目前的方法是对properties.attribute使用筛选查询,然后对properties.value使用聚合。但这不足,因为聚合使用定义的所有属性,而不仅仅是具有搜索的properties.attribute的属性。

有没有办法将聚合'空间'限制为properties.attribute匹配的属性?

为了完整性,这里发现了许多值的curl调用,我只对'farbe'(颜色)感兴趣:

curl -XGET 'http://localhost:9200/pwo/Product/_search?size=0&pretty=true' -d '{
"query": {
  "filtered": {
    "query": { "match_all" : { } },
    "filter": {
      "bool": {
        "must": { "term": { "properties.attribute": "farbe" } }
      }
    }
  }
},
"aggregations": {
  "properties": {
    "terms": { "field": "properties.value" }
  }
 }
}'

1 个答案:

答案 0 :(得分:1)

如果我理解正确,nested aggregationfilter aggregation的组合似乎可以做你想要的。

但您必须使用nested type设置映射。

作为一个玩具示例,我设置了一个简单的索引如下:

PUT /test_index
{
   "settings": {
      "number_of_shards": 1
   },
   "mappings": {
      "doc": {
         "properties": {
            "properties": {
               "type": "nested",
               "properties": {
                  "attribute": {
                     "type": "string"
                  },
                  "value": {
                     "type": "string"
                  }
               }
            }
         }
      }
   }
}

(请注意,这有点令人困惑,因为在这种情况下,“properties”既是关键字又是属性定义。)

现在我可以索引一些文件:

POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"properties":[{"attribute":"lorem","value":"Donec a diam lectus."},{"attribute":"ipsum","value":"Sed sit amet ipsum mauris."}]}
{"index":{"_id":2}}
{"properties":[{"attribute":"dolor","value":"Donec et mollis dolor."},{"attribute":"sit","value":"Donec sed odio eros."}]}
{"index":{"_id":3}}
{"properties":[{"attribute":"amet","value":"Vivamus fermentum semper porta."}]}

然后我可以按"properties.value"过滤"properties.attribute"的聚合,如下所示:

POST /test_index/_search?search_type=count
{
   "aggs": {
      "nested_properties": {
         "nested": {
            "path": "properties"
         },
         "aggs": {
            "filtered_by_attribute": {
               "filter": {
                  "terms": {
                     "properties.attribute": [
                        "lorem",
                        "amet"
                     ]
                  }
               },
               "aggs": {
                  "value_terms": {
                     "terms": {
                        "field": "properties.value"
                     }
                  }
               }
            }
         }
      }
   }
}

在这种情况下返回:

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "nested_properties": {
         "doc_count": 5,
         "filtered_by_attribute": {
            "doc_count": 2,
            "value_terms": {
               "doc_count_error_upper_bound": 0,
               "sum_other_doc_count": 0,
               "buckets": [
                  {
                     "key": "a",
                     "doc_count": 1
                  },
                  {
                     "key": "diam",
                     "doc_count": 1
                  },
                  {
                     "key": "donec",
                     "doc_count": 1
                  },
                  {
                     "key": "fermentum",
                     "doc_count": 1
                  },
                  {
                     "key": "lectus",
                     "doc_count": 1
                  },
                  {
                     "key": "porta",
                     "doc_count": 1
                  },
                  {
                     "key": "semper",
                     "doc_count": 1
                  },
                  {
                     "key": "vivamus",
                     "doc_count": 1
                  }
               ]
            }
         }
      }
   }
}

以下是我一起使用的代码:

http://sense.qbox.io/gist/1e0c58aae54090fadfde8856f4f6793b68de0167