如何正确地与字段聚合是elasticsearch上的列表

时间:2016-01-28 23:39:29

标签: elasticsearch

目前,ES日志的索引方式是某些字段具有列表而不是单个值。

实施例

_source:{
    "field1":"["item1", "item2", "item3"], 
    "field2":"something", 
    "field3": "something_else"
}

当然,列表的长度并不总是相同的。我正在尝试找到一种方法来聚合每个项目的日志数量(因此有些日志将被多次计算)

我知道我必须使用aggs,但我不知道如何形成正确的查询(在-d之后)。有人可以帮忙吗?

2 个答案:

答案 0 :(得分:0)

您可以使用以下使用terms aggregationtop_hits的查询。

{
"size": 0, 
"aggs": {
  "group": {
     "terms": {
        "script": "_source.field1.each{}"
     },
     "aggs":{
      "top_hits_log"   :{
       "top_hits"   :{
       }
      }
     }
    }       
   }
 }

输出将是:

 "buckets": [
        {
           "key": "item1",
           "doc_count": 3,
           "top_hits_log": {
              "hits": {
                 "total": 3,
                 "max_score": 1,
                 "hits": [
                    {
                       "_index": "so",
                       "_type": "test",
                       "_id": "1",
                       "_score": 1,
                       "_source": {
                          "field1": [
                             "item1",
                             "item2",
                             "item3"
                          ],
                          "field2": "something1"
                       }
                    },
                    {
                       "_index": "so",
                       "_type": "test",
                       "_id": "2",
                       "_score": 1,
                       "_source": {
                          "field1": [
                             "item1"
                          ],
                          "field2": "something2"
                       }
                    },
                    {
                       "_index": "so",
                       "_type": "test",
                       "_id": "3",
                       "_score": 1,
                       "_source": {
                          "field1": [
                             "item1",
                             "item2"
                          ],
                          "field2": "something3"
                       }
                    }
                 ]
              }
           }
        },
        {
           "key": "item2",
           "doc_count": 2,
           "top_hits_log": {
              "hits": {
                 "total": 2,
                 "max_score": 1,
                 "hits": [
                    {
                       "_index": "so",
                       "_type": "test",
                       "_id": "1",
                       "_score": 1,
                       "_source": {
                          "field1": [
                             "item1",
                             "item2",
                             "item3"
                          ],
                          "field2": "something1"
                       }
                    },
                    {
                       "_index": "so",
                       "_type": "test",
                       "_id": "3",
                       "_score": 1,
                       "_source": {
                          "field1": [
                             "item1",
                             "item2"
                          ],
                          "field2": "something3"
                       }
                    }
                 ]
              }
           }
        },
        {
           "key": "item3",
           "doc_count": 1,
           "top_hits_log": {
              "hits": {
                 "total": 1,
                 "max_score": 1,
                 "hits": [
                    {
                       "_index": "so",
                       "_type": "test",
                       "_id": "1",
                       "_score": 1,
                       "_source": {
                          "field1": [
                             "item1",
                             "item2",
                             "item3"
                          ],
                          "field2": "something1"
                       }
                    }
                 ]
              }
           }
        }
     ]

确保启用dynamic scripting。设置script.disable_dynamic: false

希望这有帮助。

答案 1 :(得分:0)

无需使用scripting。它会很慢,特别是_source解析。您还需要确保field1not_analyzed,否则您会得到奇怪的结果,因为terms aggregation是针对倒置索引中的唯一令牌执行的。

{
  "size": 0,
  "aggs": {
    "unique_items": {
      "terms": {
        "field": "field1",
        "size": 100
      },
      "aggs": {
        "documents": {
          "top_hits": {
            "size": 10
          }
        }
      }
    }
  }
}

此处terms aggregation内的大小为100,根据您认为的唯一值(默认值为10)更改此值。

希望这有帮助!