Elasticsearch嵌套聚合返回重复的结果

时间:2017-07-21 14:23:18

标签: elasticsearch elasticsearch-5

使用此映射:

PUT pizzas
{
  "mappings": {
    "pizza": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "types": {
          "type": "nested",
          "properties": {
            "topping": {
              "type": "keyword"
            },
            "base": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

这个数据:

PUT pizzas/pizza/1
{
  "name": "meat",
  "types": [
    {
      "topping": "bacon",
      "base": "normal"
    },
    {
      "topping": "pepperoni",
      "base": "normal"
    }
  ]
}

PUT pizzas/pizza/2
{
  "name": "veg",
  "types": [
    {
      "topping": "broccoli",
      "base": "normal"
    }
  ]
}

如果我运行此嵌套聚合查询:

GET pizzas/_search
{
  "size": 0,
  "aggs": {
    "types_agg": {
      "nested": {
        "path": "types"
      },
      "aggs": {
        "base_agg": {
          "terms": {
            "field": "types.base"
          }
        }
      }
    }
  }
}

我得到了这个结果:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "types_agg": {
      "doc_count": 3,
      "base_agg": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "normal",
            "doc_count": 3
          }
        ]
      }
    }
  }
}

我希望我的聚合返回doc_count为2,因为只有两个文档与我的查询匹配。然而很明显,因为它是一个倒排索引,它找到了3个结果,因此找到了3个文档。

无论如何都要让它返回唯一的文档计数?

(在Elasticsearch 5.4.3中测试)

1 个答案:

答案 0 :(得分:1)

在提出问题后不久就发现了answer

将聚合查询更改为:

GET pizzas/_search
{
  "size": 0,
  "aggs": {
    "types_agg": {
      "nested": {
        "path": "types"
      },
      "aggs": {
        "base_agg": {
          "terms": {
            "field": "types.base"
          },
          "aggs": {
            "top_reverse_nested": {
              "reverse_nested": {}
            }
          }
        }
      }
    }
  }
}

产生结果:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "types_agg": {
      "doc_count": 3,
      "base_agg": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "normal",
            "doc_count": 3,
            "top_reverse_nested": {
              "doc_count": 2
            }
          }
        ]
      }
    }
  }
}

添加到查询中的重要部分是:

"aggs": {
    "top_reverse_nested": {
        "reverse_nested": {}
    }
}

反向嵌套连接回到文档的根目录,因此它只获得唯一的聚合。

您可以阅读reverse_nested here