Elasticsearch:使用脚本从数组生成术语

时间:2015-02-19 20:21:56

标签: elasticsearch

会喜欢解释为什么会发生这种情况以及如何纠正它。

这是源文档的片段:

{
   "created_time":1412988495000,
   "tags":{
      "items":[
         {
            "tag_type":"Placement",
            "tag_id":"id1"
         },
         {
            "tag_type":"Product",
            "tag_id":"id2"
         }
      ]
   }
}

以下术语聚合:

  "aggs":{
       "tags":{
          "terms":{
             "script":"doc['tags'].value != null ? doc['tags.items.tag_type'].value + ':' + doc['tags.items.tag_id'].value : ''",
             "size":2000,
             "exclude":{
                "pattern":"null:null"
             }
          }
       }
    }

返回:

   "buckets":[
      {
         "key":"Placement:id1",
         "doc_count":1
      },
      {
         "key":"Placement:id2",
         "doc_count":1
      }
   ]

...当你期望的时候:

   "buckets":[
      {
         "key":"Placement:id1",
         "doc_count":1
      },
      {
         "key":"Product:id2",
         "doc_count":1
      }
   ]

1 个答案:

答案 0 :(得分:1)

我可能会选择nested type。我不知道你的设置的所有细节,但这里至少是一个概念证明。我取出了"items"属性,因为我不需要那么多层,只使用"tags"作为nested类型。如果需要,可以将其添加回来。我想。

所以我设置了一个"nested"属性的索引:

DELETE /test_index

PUT /test_index
{
   "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0
   },
   "mappings": {
      "doc": {
         "properties": {
            "created_time": {
               "type": "date"
            },
            "tags": {
               "type": "nested",
               "properties": {
                  "tag_type": {
                     "type": "string",
                     "index": "not_analyzed"
                  },
                  "tag_id": {
                     "type": "string",
                     "index": "not_analyzed"
                  }
               }
            }
         }
      }
   }
}

然后添加了几个文档(注意结构与你的结构略有不同):

PUT /test_index/doc/1
{
   "created_time": 1412988495000,
   "tags": [
      {
         "tag_type": "Placement",
         "tag_id": "id1"
      },
      {
         "tag_type": "Product",
         "tag_id": "id2"
      }
   ]
}

PUT /test_index/doc/2
{
   "created_time": 1412988475000,
   "tags": [
      {
         "tag_type": "Type3",
         "tag_id": "id3"
      },
      {
         "tag_type": "Type4",
         "tag_id": "id3"
      }
   ]
}

现在scripted terms aggregation内的nested aggregation似乎可以解决问题:

POST /test_index/_search?search_type=count
{
   "query": {
      "match_all": {}
   },
   "aggs": {
      "tags": {
         "nested": { "path": "tags" },
         "aggs":{
             "tag_vals": {
                 "terms": {
                     "script": "doc['tag_type'].value+':'+doc['tag_id'].value"
                 }
             }
         }
      }
   }
}
...
{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "tags": {
         "doc_count": 4,
         "tag_vals": {
            "buckets": [
               {
                  "key": "Placement:id1",
                  "doc_count": 1
               },
               {
                  "key": "Product:id2",
                  "doc_count": 1
               },
               {
                  "key": "Type3:id3",
                  "doc_count": 1
               },
               {
                  "key": "Type4:id3",
                  "doc_count": 1
               }
            ]
         }
      }
   }
}

以下是我使用的代码:

http://sense.qbox.io/gist/4ceaf8693f85ff257c2fd0639ba62295f2e5e8c5