Question

注意这不是“如何获取不同值的计数”问题。我想要文档，而不是计数。

假设我有这种映射：

country, color, height, weight

我已将这些文件编入索引：

1. RU, red, 180, 90
2. BY, green, 170, 80
3. BY, blue, 180, 75
4. KZ, blue, 180, 95
5. KZ, red, 185, 100
6. KZ, red, 175, 80
7. KZ, red, 170, 80

我想执行像groupby(country, color, doc_limit=2)之类的查询，它会返回如下内容：

{
  "RU": {
    "red": [
      (doc 1. RU, red, 180, 90)
    ],
  },
  "BY": {
    "green": [
      (doc 2)
    ],
    "blue": [
      (doc 3)
    ]
  },
  "KZ": {
    "blue": [
      (doc 4)
    ],
    "red": [
      (doc 5),
      (doc 6)
    ]
  }
}

每个桶中不超过2个文档。

我该怎么做？

Answer 1

这可以通过country字段上的terms aggregation与terms字段上的color子聚合相结合，最后是top_hits aggregation来实现每个桶获得2个匹配的文档

{
   "size": 0,
   "aggs": {
      "countries": {
         "terms": {
            "field": "country"
         },
         "aggs": {
            "colors": {
               "terms": {
                  "field": "color"
               },
               "aggs": {
                  "docs": {
                     "top_hits": {
                        "size": 2
                     }
                  }
               }
            }
         }
      }
   }
}

按字段值分组文档

1 个答案: