获取具有弹性的独特价值的所有文件

时间:2018-03-21 05:58:57

标签: elasticsearch

例如:
我有很多这样的文件:

email status
1@123.com open
1@123.com click
2@123.com open
3@123.com open

我将查询具有唯一状态值的​​所有文件:“打开”,由于记录“1@123.com”包含“点击”状态,所以“1 @123.com”不要指望!

我在下面尝试了这个,但不是我的期望:

{
  "aggs": {
    "hard_bounce_count": {
      "filter": {
        "term": {
          "actionStatus": "open"
        }
      },
      "aggs": {
        "email_count": {
          "value_count": {
            "field": "email"
          }
        }
      }

我期待这样的反应:

2@123.com open
3@123.com open

我怎么能这样做,谢谢......

2 个答案:

答案 0 :(得分:0)

此处,外部术语aggs(名为 EMAIL_LIST )会返回所有电子邮件,然后在每个电子邮件存储桶中首先查找状态是否已打开(使用名称为的过滤器aggs)打开)然后它会查找状态是否为“打开”(使用名为 OTHER_THAN_OPEN 的其他过滤器aggs)

{
   "size": 0,
   "aggs": {
      "EMAIL_LIST": {
         "terms": {
            "field": "email.keyword"
         },
         "aggs": {
            "OPEN": {
               "filter": {
                  "bool": {
                     "must": [
                        {
                           "term": {
                              "status": "open"
                           }
                        }
                     ]
                  }
               }
            },
            "OTHER_THAN_OPEN": {
               "filter": {
                  "bool": {
                     "must_not": [
                        {
                           "term": {
                              "status": "open"
                           }
                        }
                     ]
                  }
               }
            },
            "SELECTION_SCRIPT": {
               "bucket_selector": {
                  "buckets_path": {
                     "open_count": "OPEN._count",
                     "other_than_open_count": "OTHER_THAN_OPEN._count"
                  },
                  "script": "params.other_than_open_count==0 && params.open_count>0"
               }
            }
         }
      }
   }
}

在“bucket_selector”聚合之上,只选择那些只有打开状态的存储桶

 "aggregations": {
      "EMAIL_LIST": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "2@123.com",
               "doc_count": 1,
               "OTHER_THAN_OPEN": {
                  "doc_count": 0
               },
               "OPEN": {
                  "doc_count": 1
               }
            },
            {
               "key": "3@123.com",
               "doc_count": 1,
               "OTHER_THAN_OPEN": {
                  "doc_count": 0
               },
               "OPEN": {
                  "doc_count": 1
               }
            }
         ]
      }
   }

所以最终答案将是电子邮件“2@123.com”“3@123.com”

答案 1 :(得分:0)

我也可以查询。

{
  "aggs": {
    "email": {
      "terms": {
        "field": "email"
      },
      "aggs": {
        "status_group": {
          "terms": {
            "field": "status"
          }
        }
      }
    }
  }
}

响应:

"aggregations": {
    "email": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [{
                "key": "1@123.com",
                "doc_count": 2,
                "status_group": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [{
                            "key": "click",
                            "doc_count": 1
                        }, {
                            "key": "open",
                            "doc_count": 1
                        }
                    ]
                }
            }, {
                "key": "2@123.com",
                "doc_count": 1,
                "status_group": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [{
                            "key": "open",
                            "doc_count": 1
                        }
                    ]
                }
            }, {
                "key": "3@123.com",
                "doc_count": 1,
                "status_group": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [{
                            "key": "open",
                            "doc_count": 1
                        }
                    ]
                }
            }
        ]
    }
}

但我怎样才能排除" 1 @ email"在结果桶中,因​​为我最终需要所有符合条件的文件的统计数据