Question

我有一个ElasticSearch索引，用于存储电话交易（短信，彩信，电话等）及其相关费用。

这些文件的关键是MSISDN（ MSISDN =电话号码）。在我的应用程序中，我知道有一组用户。每个用户可以拥有一个或多个MSISDN。

以下是此类文件的映射：

"mappings" : {
      "cdr" : {
        "properties" : {
          "callDatetime" : {
            "type" : "long"
          },
          "callSource" : {
            "type" : "string"
          },
          "callType" : {
            "type" : "string"
          },
          "callZone" : {
            "type" : "string"
          },
          "calledNumber" : {
            "type" : "string"
          },
          "companyKey" : {
            "type" : "string"
          },
          "consumption" : {
            "properties" : {
              "data" : {
                "type" : "long"
              },
              "voice" : {
                "type" : "long"
              }
            }
          },
          "cost" : {
            "type" : "double"
          },
          "country" : {
            "type" : "string"
          },
          "included" : {
            "type" : "boolean"
          },
          "msisdn" : {
            "type" : "string"
          },
          "network" : {
            "type" : "string"
          }
        }
      }
    }

我的目标和问题：

我的目标是通过组进行 callType 检索费用的查询。但是ElasticSearch中没有表示组，只在我的PostgreSQL数据库中表示。

因此，我将创建一个方法来检索每个现有组的所有MSISDN，并获得类似于String数组的列表，其中包含每个组中的每个MSISDN。

让我说我有类似的东西：

"msisdn_by_group" : [
    {
       "group1" : ["01111111111", "02222222222", "033333333333", "044444444444"]
    },
    {
       "group2" : ["05555555555","06666666666"]
    }
]

现在，我将使用它来生成Elasticsearch查询。我想使用聚合，即成本的总和，对于不同存储桶中的所有这些术语，然后再通过callType将其拆分。（制作叠加条形图）。

我尝试了几件事，但没有设法让它发挥作用（直方图，水桶，术语和总和主要是我正在玩的关键词）。

如果有人在这里可以帮我处理订单，并且我可以使用关键字实现这一点，那就太棒了:)谢谢

编辑：这是我的最后一次尝试：的 QUERY：

{
    "aggs" : {
        "cost_histogram": {
            "terms": {
                "field": "callType"
            },
            "aggs": {
                "cost_histogram_sum" : {
                    "sum": {
                        "field": "cost"
                    }
                }
            }
        }
    }
}

我得到了预期的结果，但它错过了＆＃34;组＆＃34;拆分，因为我不知道如何将MSISDN数组作为标准传递：

结果：

"aggregations": {
    "cost_histogram": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "data",
          "doc_count": 5925,
          "cost_histogram_sum": {
            "value": 0
          }
        },
        {
          "key": "sms_mms",
          "doc_count": 5804,
          "cost_histogram_sum": {
            "value": 91.76999999999995
          }
        },
        {
          "key": "voice",
          "doc_count": 5299,
          "cost_histogram_sum": {
            "value": 194.1196
          }
        },
        {
          "key": "sms_mms_plus",
          "doc_count": 35,
          "cost_histogram_sum": {
            "value": 7.2976
          }
        }
      ]
    }
  }

Answer 1

好的，我发现如何使用一个查询来实现这个目标，但它是一个很长的查询，因为它为每个组重复，但我没有选择。我正在使用“过滤器”聚合器。

这是一个基于我在上面的问题中写的数组的工作示例：

POST localhost：9200 / cdr / _search？size = 0

{
    "query": {
        "term" : {
            "companyKey" : 1
        }   
    },
    "aggs" : {
        "group_1_split_cost": {
            "filter": {
                "bool": {
                    "should": [{
                        "bool": {
                            "must": {
                                "match": {
                                    "msisdn": "01111111111"
                                }
                            }
                        }
                    },{
                        "bool": {
                            "must": {
                                "match": {
                                    "msisdn": "02222222222"
                                }
                            }
                        }
                    },{
                        "bool": {
                            "must": {
                                "match": {
                                    "msisdn": "03333333333"
                                }
                            }
                        }
                    },{
                        "bool": {
                            "must": {
                                "match": {
                                    "msisdn": "04444444444"
                                }
                            }
                        }
                    }]
                }
            },
            "aggs": {
                "cost_histogram": {
                    "terms": {
                        "field": "callType"
                    },
                    "aggs": {
                        "cost_histogram_sum" : {
                            "sum": {
                                "field": "cost"
                            }
                        }
                    }
                }
            }
        },
        "group_2_split_cost": {
            "filter": {
                "bool": {
                    "should": [{
                        "bool": {
                            "must": {
                                "match": {
                                    "msisdn": "05555555555"
                                }
                            }
                        }
                    },{
                        "bool": {
                            "must": {
                                "match": {
                                    "msisdn": "06666666666"
                                }
                            }
                        }
                    }]
                }
            },
            "aggs": {
                "cost_histogram": {
                    "terms": {
                        "field": "callType"
                    },
                    "aggs": {
                        "cost_histogram_sum" : {
                            "sum": {
                                "field": "cost"
                            }
                        }
                    }
                }
            }
        }
    }
}

感谢更新版本的Elasticsearch，我们现在可以嵌套非常深的聚合，但是仍然有点太糟糕了，我们无法将值数组传递给“OR”运算符或类似的东西。我猜，它可以减少这些查询的大小。即使它们有点特殊并且在利基案例中使用，就像我的一样。

通过String数组进行Elasticsearch聚合

1 个答案: