过滤器聚合中的基数聚合

时间:2017-03-07 21:42:31

标签: elasticsearch

我试图使用基数聚合来计算不同的值。

这是我的查询

{
    "size": 100,
    "_source":["awardeeName"],
    "query": {
        "match_phrase":{"awardeeName" :"The President and Fellows of Harvard College" }  
    },
    "aggs":{
        "awardeeName": {
            "filter" : { "query": { "match_phrase":{"awardeeName" :"The President and Fellows of Harvard College" }}},
            "aggs": {
                "distinct":{"cardinality":{  "field": "awardeeName"}}
           }
        }

    }               
}

使用match_phrase查询某些文本,使用相同的匹配短语进行聚合,然后调用基数, 结果,命中计数和聚合匹配器匹配但基数显示不同的数字,比过滤器和总命中数大得多,这里是结果

  {
    "took": 37,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 3,
        "max_score": 13.516766,
        "hits": [
            {
                "_index": "development",
                "_type": "document",
                "_id": "140a3f5b-e876-4542-b16d-56c3c5ae0e58",
                "_score": 13.516766,
                "_source": {
                    "awardeeName": "The President and Fellows of Harvard College"
                }
            },
            {
                "_index": "development",
                "_type": "document",
                "_id": "5c668b06-c612-4349-8735-2a79ee2bb55e",
                "_score": 12.913888,
                "_source": {
                    "awardeeName": "The President and Fellows of Harvard College"
                }
            },
            {
                "_index": "development",
                "_type": "document",
                "_id": "a9560519-1b2a-4e64-b85f-4645a41d5810",
                "_score": 12.913888,
                "_source": {
                    "awardeeName": "The President and Fellows of Harvard College"
                }
            }
        ]
    },
    "aggregations": {
        "awardeeName": {
            "doc_count": 3,
            "distinct": {
                "value": 7
            }
        }
    }
}

我希望基数适用于过滤器的结果,但在这种情况下,基数显示7,为什么显示7?不同的值如何计算超过总命中数?

2 个答案:

答案 0 :(得分:1)

cardinality字段上的awardeeName聚合正在计算所有匹配文档在该字段上显示的不同令牌的数量。

在您的情况下,在三个匹配的文档中,awardeeName字段包含完全相同的值The President and Fellows of Harvard College,其中包含7个令牌,因此您看到7的结果。

您可能想要实现的是将The President and Fellows of Harvard College计为单个令牌,为此您需要keyword field(而不是text个)并在{cardinality中使用该字段{1}}聚合。

答案 1 :(得分:0)

示例:

GET calserver-2021.04.1*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "method.keyword": "searchUser"
          }
        },
        {
          "term": {
            "statusCode": "500"
          }
        }
      ]
    }
  },
  "aggs": {
    "username_count": {
      "cardinality": {
        "field": "username.keyword",
        "precision_threshold": 40000
      }
    }
  }
}