有什么方法可以仅对Elasticsearch中的查询命中进行汇总?

时间:2019-07-24 03:55:20

标签: elasticsearch lucene elasticsearch-aggregation

我有一个ES查询,我想执行一些查询以检索符合查询条件的100个元素,然后对这些值进行汇总。但是会发生什么情况,如果我提供大小100,则查询返回100次匹配,聚合返回100个存储桶,但匹配次数与存储桶中的值不匹配。

我尝试加载所有带有“ size”:0的值,但是我的记录很多,这会花费很多时间。

我也尝试使用2个查询(使用相当繁琐的Terms agg),但我想尽可能通过一个查询来完成此操作。有什么办法可以做到这一点?

{
  "size": 10, 
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "amount": {
              "gte": 10000,
              "lte": 20000
            }
          }
        }
      ]
    }
  },
  "_source": {
    "include":["id","amount"]
  },
  "aggs": {
    "ID": {
          "terms": {
            "field": "id"
      },
      "aggs": {
        "SumAgg": {
          "sum: {
             "field": "paidAmount"
}
        }
      }
    }
      }
}

编辑:

以下是回复:

  "hits": {
    "total": 712,
    "max_score": 1,
    "hits": [
      {
        "_score": 1,
        "_source": {
          "amount": 15732,
          "id": 18xxxxxxx108
        }
      },
      {
        "_score": 1,
        "_source": {
          "amount": 11485,
          "id": 33xxxxxxx107
        }
      },
      {

        "_score": 1,
        "_source": {
          "amount": 16757,
          "id": 34xxxxxxx286
        }
      },
      {

        "_score": 1,
        "_source": {
          "amount": 16134,
          "id": 29xxxxxxx018
        }
      },
      {

        "_score": 1,
        "_source": {
          "amount": 11767,
          "id": 11xxxxxxx017
        }
      },
      {


        "_score": 1,
        "_source": {
          "amount": 16744,
          "id": 38xxxxxxx106
        }
      },
      {


        "_score": 1,
        "_source": {
          "amount": 10587,
          "id": 34xxxxxxx113
        }
      },
      {


        "_score": 1,
        "_source": {
          "amount": 18704,
          "id": 34xxxxxxx177
        }
      },
      {


        "_score": 1,
        "_source": {
          "amount": 10077,
          "id": 13xxxxxxx306
        }
      },
      {


        "_score": 1,
        "_source": {
          "amount": 12812,
          "id": 46xxxxxxx334
        }
      }
    ]
  },
  "aggregations": {
    "ID": {
      "doc_count_error_upper_bound": 7,
      "sum_other_doc_count": 702,
      "buckets": [
        {
          "key": 24,
          "doc_count": 1,
          "SumAgg": {
            "value": 17176
          }
        },
        {
          "key": 27,
          "doc_count": 1,
          "SumAgg": {
            "value": 19924
          }
        },
        {
          "key": 81,
          "doc_count": 1,
          "SumAgg": {
            "value": 19784
          }
        },
        {
          "key": 93,
          "doc_count": 1,
          "SumAgg": {
            "value": 10942
          }
        },
        {
          "key": 124,
          "doc_count": 1,
          "SumAgg": {
            "value": 12337
          }
        },
        {
          "key": 148,
          "doc_count": 1,
          "SumAgg": {
            "value": 18604
          }
        },
        {
          "key": 158,
          "doc_count": 1,
          "SumAgg": {
            "value": 14680
          }
        },
        {
          "key": 217,
          "doc_count": 1,
          "SumAgg": {
            "value": 17295
          }
        },
        {
          "key": 273,
          "doc_count": 1,
          "SumAgg": {
            "value": 10989
          }
        },
        {
          "key": 321,
          "doc_count": 1,
          "SumAgg": {
            "value": 13917
          }
        }
      ]
    }
  }

我希望两个上下文中的ID相同。

1 个答案:

答案 0 :(得分:0)

刚刚意识到,elasticsearch具有一个称为Sampler Aggregation的聚合,它使您可以仅对前几个样本运行聚合查询。

这是实验性功能,直到版本 5.x ,但看来他们已经发布了此帖子 6.0 版本开始。我很想念这个!! :(

下面是查询的格式:

查询:

POST <your_index_name>/_search
{  
   "size":10,
   "query":{  
      "bool":{  
         "must":[  
            {  
               "range":{  
                  "amount":{  
                     "gte":10000,
                     "lte":20000
                  }
               }
            }
         ]
      }
   },
   "_source":{  
      "include":[  
         "id",
         "amount"
      ]
   },
   "aggs":{  
      "mysampler":{  
         "sampler":{                 <---- Note this
            "shard_size":10
         },
         "aggs":{  
            "ID":{  
               "terms":{  
                  "field":"id"
               },
               "aggs":{  
                  "SumAgg":{  
                     "sum":{  
                        "field":"amount"
                     }
                  }
               }
            }
         }
      }
   }
}

希望这会有所帮助,如果您认为这有帮助,请随时接受它作为答案和/或投票!!