是否可以在ElasticSearch中使用逻辑关系执行用户计数/基数?

时间:2016-04-12 05:19:04

标签: elasticsearch count cardinality

我的用户文档格式如下:

{
    userId: "<userId>",
    userAttributes: [
        "<Attribute1>",
        "<Attribute2>",
        ...
        "<AttributeN>"
    ]
}

我希望能够获得回答逻辑语句的唯一用户数量,例如 有多少用户拥有attribute1 AND attribute2或attribute3?

我已经阅读了cardinality-aggregation中的基数函数,但它似乎适用于单个值,缺少“AND”和“OR”的逻辑能力。

请注意,我有大约1,000,000,000个文档,我需要尽快得到结果,这就是为什么我要查看基数估算。

1 个答案:

答案 0 :(得分:1)

考虑到userAttributesstring的一个简单数组(在我的情况下分析,但是单个小写术语),这个尝试怎么样:

POST /users/user/_bulk
{"index":{"_id":1}}
{"userId":123,"userAttributes":["xxx","yyy","zzz"]}
{"index":{"_id":2}}
{"userId":234,"userAttributes":["xxx","yyy","aaa"]}
{"index":{"_id":3}}
{"userId":345,"userAttributes":["xxx","yyy","bbb"]}
{"index":{"_id":4}}
{"userId":456,"userAttributes":["xxx","ccc","zzz"]}
{"index":{"_id":5}}
{"userId":567,"userAttributes":["xxx","ddd","ooo"]}

GET /users/user/_search
{
  "query": {
    "query_string": {
      "query": "userAttributes:(((xxx AND yyy) NOT zzz) OR ooo)"
    }
  },
  "aggs": {
    "unique_ids": {
      "cardinality": {
        "field": "userId"
      }
    }
  }
}

给出以下内容:

  "hits": [
     {
        "_index": "users",
        "_type": "user",
        "_id": "2",
        "_score": 0.16471066,
        "_source": {
           "userAttributes": [
              "xxx",
              "yyy",
              "aaa"
           ]
        }
     },
     {
        "_index": "users",
        "_type": "user",
        "_id": "3",
        "_score": 0.04318809,
        "_source": {
           "userAttributes": [
              "xxx",
              "yyy",
              "bbb"
           ]
        }
     },
     {
        "_index": "users",
        "_type": "user",
        "_id": "5",
        "_score": 0.021594046,
        "_source": {
           "userAttributes": [
              "xxx",
              "ddd",
              "ooo"
           ]
        }
     }
  ]