使用自定义聚合逻辑按查询进行Elasticsearch分组

时间:2019-05-17 16:36:11

标签: elasticsearch

我在Elasticsearch中有两个恶魔

const path = require('path')

module.exports = {
    entry: ['babel-polyfill', './src/index.js'],
    output: {
        path: path.resolve(__dirname, 'public/scripts'),
        filename: 'bundle.js'
    },
    module: {
        rules: [{
            test: /\.js$/,
            exclude: /node_modules/,
            use: {
                loader: 'babel-loader',
                options: {
                    presets: ['env']
                }
            }

        }]
    },
    devServer: {
        contentBase: path.resolve(__dirname, 'public'),
        publicPath: '/scripts'
    },

}

我想找到所有URI BROWSER 都未被特定浏览器击中的URI 我想写以下查询

chrome

返回结果。

我已完成第一步查询

1. Group by URI,
2. find distinct BROWSER set,
3. filter URIs where chrome is not in BROWERS set.

在此查询中,我没有实现步骤2和3。

1 个答案:

答案 0 :(得分:0)

基本上有两种方法可以实现此目的。

解决方案1:使用Elasticsearch DSL

我只是使用Bool Query过滤了chrome中没有browser的文档,并发帖说我只是简单地使用了两个Terms Aggregation寻找。这样,与在集合上应用过滤器相比,效率更高。

查询的结构为:

- Bool Query
- Terms Aggregation (Parent for uri)
  - Terms Aggregation (Child for browsers)

请注意,我假设您的字段uri和browser的类型均为keyword

示例文档:

POST myindex/mydocs/1
{
  "uri": "www.google.com",
  "browser": "chrome"
}

POST myindex/mydocs/2
{
  "uri": "www.google.com",
  "browser": "firefox"
}

POST myindex/mydocs/3
{
  "uri": "www.google.com",
  "browser": "iexplorer"
}

查询:

POST myindex/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "browser": "chrome"
          }
        }
      ]
    }
  }, 
  "aggs": {
    "myuri": {
      "terms": {
        "field": "uri",
        "size": 10
      },
      "aggs": {
        "mybrowsers": {
          "terms": {
            "field": "browser",
            "size": 10
          }
        }
      }
    }
  }
}

回复

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "myuri": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "www.google.com",
          "doc_count": 2,
          "mybrowsers": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "firefox",
                "doc_count": 1
              },
              {
                "key": "iexplorer",
                "doc_count": 1
              }
            ]
          }
        }
      ]
    }
  }
}

解决方案2:使用Elasticsearch SQL Access

如果您使用的是xpack,并且希望通过SQL Access来实现,那么您的查询将转换为简单的SQL查询,如下所示:

POST /_xpack/sql?format=txt
{
  "query": "SELECT uri, browser, count(1) FROM myindex WHERE browser <> 'chrome' GROUP BY uri, browser"

}

回复

      uri      |    browser    |   COUNT(1)    
---------------+---------------+---------------  
www.google.com |firefox        |1              
www.google.com |iexplorer      |1    

让我知道这是否有帮助!