Question

我如何才能通过文档中的所有数组项（而不是数组的每个值）进行聚合。例如我有几个文件，像这样

{'some_field': [1,2]}
{'some_field': [1]}
{'some_field': [1]}
{'some_field': [7,2]}

现在有了像这样的简单聚合查询

{
"aggs" : {
    "agg_name" : {
        "terms" : {
            "field" : "some_field"
        }
    }
},
"size": 0
}

我得到了这样的结果

"buckets": [
        {
          "key": "1",
          "doc_count": 3
        },
        {
          "key": "2",
          "doc_count": 2
        },
        ...
]

但是我想要这样的全数组视图

"buckets": [
        {
          "key": [1],
          "doc_count": 2
        },
        {
          "key": [1,2],
          "doc_count": 1
        },
        {
          "key": [7,2],
          "doc_count": 1
        },
]

Answer 1

我正在寻找相同的聚合，但仍然不存在。所以用一个无痛的脚本修复

POST some_index/_search
{
  "size": 0,
  "aggs": {
    "myaggs": {
      "terms": {
        "size": 100,
        "script": {
          "lang": "painless",
          "source": """
            def myString = "";
            for (int i = 0; i < doc['data. some_field.keyword'].length; ++i) {
              myString += doc['data. some_field.keyword'][i] + ", ";
            }
            return myString;
          """
        }
      }
    }
  } 
}

Answer 2

我认为this是您的答案：

在映射中添加子字段PUT /test_index { "mappings": { "doc": { "properties": { "states": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } } } } } }：

POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"states":["New York","New Jersey","California"]}
{"index":{"_id":2}}
{"states":["New York","North Carolina","North Dakota"]}

然后添加几个文档：

POST /test_index/_search
{
    "size": 0, 
    "aggs" : {
        "states" : {
            "terms" : { 
                "field" : "states.raw",
                "size": 10
            }
        }
    }
}

在子字段上运行聚合：

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "states": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "New York",
               "doc_count": 2
            },
            {
               "key": "California",
               "doc_count": 1
            },
            {
               "key": "New Jersey",
               "doc_count": 1
            },
            {
               "key": "North Carolina",
               "doc_count": 1
            },
            {
               "key": "North Dakota",
               "doc_count": 1
            }
         ]
      }
   }
}

返回：

def addTime(self, s):
    temp = Time(0,0)
    temp.minute = self.minute + s.minute
    temp.hour = self.hour + s.hour
    if temp.minute>=60:
        temp.hour+=1
        temp.minute-=60
    return temp.displayTime()  # <- this is the issue.

这是我用来测试的代码：

Elasticsearch按全数组聚合

2 个答案: