Elasticsearch指标聚合:数组中的元素数

时间:2015-08-05 19:47:34

标签: elasticsearch aggregate

我想做一个非常复杂的查询/聚合。我无法理解,因为我刚刚开始使用ES。我看到的文件是这样的:

{
  "keyword": "some keyword",
  "items": [
    {
      "name":"my first item",
      "item_property_1":"A",
      ( other properties here )
    },
    {
      "name":"my second item",
      "item_property_1":"B",
      ( other properties here )
    },
    {
      "name":"my third item",
      "item_property_1":"A",
      ( other properties here )
    }
  ]
  ( other properties... )
},
{
  "keyword": "different keyword",
  "items": [
    {
      "name":"cool item",
      "item_property_1":"A",
      ( other properties here )
    },
    {
      "name":"awesome item",
      "item_property_1":"C",
      ( other properties here )
    },
  ]
  ( other properties... )
},
( other documents... )

现在,我想要做的是,对于每个关键字,计算property_1可以拥有的几个可能值中的哪些项目。也就是说,我想要一个桶聚合,它会产生以下响应:

{
  "keyword": "some keyword",
  "item_property_1_aggretation": [
    {
      "key":"A",
      "count": 2,
    },
    {
      "key":"B",
      "count": 1,
    }
  ]
},
{
  "keyword": "different keyword",
  "item_property_1_aggretation": [
    {
      "key":"A",
      "count": 1,
    },
    {
      "key":"C",
      "count": 1,
    }
  ]
},
( other keywords... )

如果需要映射,你还可以具体说明哪个?我没有任何非默认映射,我只是把所有内容都丢弃了。

编辑: 通过在此处发布前一个示例的批量PUT来节省您的麻烦

PUT /test/test/_bulk
{ "index": {}}
{  "keyword": "some keyword",  "items": [    {      "name":"my first item",      "item_property_1":"A"    },    {      "name":"my second item",      "item_property_1":"B"    },    {      "name":"my third item",      "item_property_1":"A"     }  ]}
{ "index": {}}
{  "keyword": "different keyword",  "items": [    {      "name":"cool item",      "item_property_1":"A"    },    {      "name":"awesome item",      "item_property_1":"C"    }  ]}

EDIT2:

我刚试过这个:

POST /test/test/_search
{
    "size":2,
    "aggregations": {
        "property_1_count": {
            "terms":{
                "field":"item_property_1"
            }
        }
    }
}

得到了这个:

"aggregations": {
   "property_1_count": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
         {
            "key": "a",
            "doc_count": 2
         },
         {
            "key": "b",
            "doc_count": 1
         },
         {
            "key": "c",
            "doc_count": 1
         }
      ]
   }
}

关闭但没有雪茄。您可以查看正在发生的事情,它会对每个item_property_1进行分支,而不管它属于哪个keyword。我确定解决方案涉及正确添加一些映射,但我不能指责它。建议?

EDIT3: 基于此: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-nested-type.html 我想尝试将nested类型添加到属性items。为此,我尝试了:

PUT /test/_mapping/test
{
    "test":{
        "properties": {
            "items": {
                "type": "nested",
                "properties": {
                    "item_property_1":{"type":"string"}
                }
            }
        }
    }
}

但是,这会返回错误:

{
   "error": "MergeMappingException[Merge failed with failures {[object mapping [items] can't be changed from non-nested to nested]}]",
   "status": 400
}

这可能与该网址上的警告有关:"将对象类型更改为嵌套类型需要重新编制索引。"

那么,我该怎么做?

1 个答案:

答案 0 :(得分:4)

很好的尝试,你几乎就在那里!这就是我想出的。根据您的映射建议,我使用的映射如下:

nested

注意:您需要擦除并重新索引数据,因为您无法将字段类型从nested更改为curl -XPOST localhost:9200/test/test/_bulk -d ' { "index": {}} { "keyword": "some keyword", "items": [ { "name":"my first item", "item_property_1":"A" }, { "name":"my second item", "item_property_1":"B" }, { "name":"my third item", "item_property_1":"A" } ]} { "index": {}} { "keyword": "different keyword", "items": [ { "name":"cool item", "item_property_1":"A" }, { "name":"awesome item", "item_property_1":"C" } ]} '

然后我使用您分享的批量查询创建了一些数据:

keyword

最后,这是您可以用来获得预期结果的聚合查询。我们首先使用terms aggregationitem_property_1进行操作,然后针对每个关键字,我们使用嵌套的items字段进行存储。由于nested现在属于items类型,因此关键是对terms使用nested aggregation,然后为item_property_1使用{ "size": 0, "aggregations": { "by_keyword": { "terms": { "field": "keyword" }, "aggs": { "prop_1_count": { "nested": { "path": "items" }, "aggs": { "prop_1": { "terms": { "field": "items.item_property_1" } } } } } } } } 子聚合} field。

{
  ...
  "aggregations" : {
    "by_keyword" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "different keyword",       <---- keyword 1
        "doc_count" : 1,
        "prop_1_count" : {
          "doc_count" : 2,
          "prop_1" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [ {                <---- buckets for item_property_1
              "key" : "A",
              "doc_count" : 1
            }, {
              "key" : "C",
              "doc_count" : 1
            } ]
          }
        }
      }, {
        "key" : "some keyword",            <---- keyword 2
        "doc_count" : 1,
        "prop_1_count" : {
          "doc_count" : 3,
          "prop_1" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [ {                <---- buckets for item_property_1
              "key" : "A",
              "doc_count" : 2
            }, {
              "key" : "B",
              "doc_count" : 1
            } ]
          }
        }
      } ]
    }
  }
}

在您的数据集上运行该查询将产生以下结果:

requestParams