Question

我对弹性搜索很新，但似乎没有简单的方法来创建聚合并在完成先前的聚合后将doc_count分配给存储桶。例如，我有以下数据集，我想创建4个存储桶和组配置文件，这些存储桶之间具有特定数量的事务。

配置文件的总数应分配到桶以下，其中每个桶概述一个配置文件可能具有的最小和最大事务数。

有0-1交易的个人资料数量

有2-5笔交易的个人资料数量

有6-20笔交易的个人资料数量

有20多笔交易的个人资料

[
  {
    "profileId": "AVdiZnj6YuzD-vV0m9lx",
    "transactionId": "sdsfsdghfd"
  },
  {
    "profileId": "SRGDDUUDaasaddsaf",
    "transactionId": "asdadscfdvdvd"
  },
  {
    "profileId": "AVdiZnj6YuzD-vV0m9lx",
    "transactionId": "sdsacfsfcsafcs"
  }
]



Below request would show number of transactions per each profile but additional bucket grouping is required in order to group profiles to respective buckets using doc_cont.

    {   "size":0,
        "aggs" : {
            "profileTransactions" : {
                "terms" : {
                    "field" : "profileId"
                }
            }
        }
    }
    "buckets": [
                {
                   "key": "AVdiZnj6YuzD-vV0m9lx",
                   "doc_count": 2
                },
      {
                   "key": "SRGDDUUDaasaddsaf",
                   "doc_count": 1
                }

                ]

任何想法？

Answer 1

您可以在pipeline bucket selector aggregation的帮助下进行其他分组。使用value count aggregation，因为根据数字字段检查桶聚合。此查询需要 ES 2.x 版本。

{
  "size": 0,
  "aggs": {
    "unique_profileId0": {
      "terms": {
        "field": "profileId"
      },
      "aggs": {
        "total_profile_count": {
          "value_count": {
            "field": "profileId"
          }
        },
        "range_0-1_bucket": {
          "bucket_selector": {
            "buckets_path": {
              "totalTransaction": "total_profile_count"
            },
            "script": "totalTransaction < 2"
          }
        }
      }
    },
    "unique_profileId1": {
      "terms": {
        "field": "profileId"
      },
      "aggs": {
        "total_profile_count": {
          "value_count": {
            "field": "profileId"
          }
        },
        "range_2-5_bucket": {
          "bucket_selector": {
            "buckets_path": {
              "totalTransaction": "total_profile_count"
            },
            "script": "totalTransaction >= 2 && totalTransaction <= 5"
          }
        }
      }
    },
    "unique_profileId2": {
      "terms": {
        "field": "profileId"
      },
      "aggs": {
        "total_profile_count": {
          "value_count": {
            "field": "profileId"
          }
        },
        "range_6-20_bucket": {
          "bucket_selector": {
            "buckets_path": {
              "totalTransaction": "total_profile_count"
            },
            "script": "totalTransaction >= 6 && totalTransaction <= 20"
          }
        }
      }
    },
    "unique_profileId3": {
      "terms": {
        "field": "profileId"
      },
      "aggs": {
        "total_profile_count": {
          "value_count": {
            "field": "profileId"
          }
        },
        "range_20_more_bucket": {
          "bucket_selector": {
            "buckets_path": {
              "totalTransaction": "total_profile_count"
            },
            "script": "totalTransaction > 20"
          }
        }
      }
    }
  }
}

您需要enable dynamic scripting才能生效，将以下两行添加到YML文件

script.inline: on
script.indexed: on

和重新启动每个节点。

希望它有所帮助！

ElasticSearch组并分发到存储桶

1 个答案: