让所有父母都拥有超过'AND'的孩子

时间:2014-01-13 18:51:38

标签: elasticsearch

我已经找到了答案,但遗憾的是找不到答案。 我有一个包含用户类型的索引:

    users: {
    properties: {

        loginKey: {
            type: string
        }

        timeZone: {
            type: long
        }
        maxEmailsPerWeek: {
            type: long
        }

        joinDate: {
            format: dateOptionalTime
            type: date
        }
        preferredEntityId: {
            type: long
        }
        partition: {
            type: long
        }
        postalCode: {
            type: string
        }
        nickName: {
            type: string
        }
        announcements: {
            type: long
        }

        gender: {
            type: string
        }

        birthDate: {
            format: dateOptionalTime
            type: date
        }
        firstName: {
            type: string
        }
        emailTestId: {
            type: long
        }
        emailStateDate: {
            format: dateOptionalTime
            type: date
        }
        lastName: {
            type: string
        }
        emailAddress: {
            type: string
        }
...
    }
}

并且有一种用户的活动:

    activity: {
    _routing: {
        required: true
    }
    properties: {
        eventTimestamp: {
            format: dateOptionalTime
            type: date
        }
        userAgent: {
            type: string
        }
        recordType: {
            type: string
        }
        universalTrackingParams: {
            properties: {
                MODULE_ID: {
                    type: string
                }
                TRACKING_CODE: { // this is a unique user identifier
                    index: not_analyzed
                    omit_norms: true
                    index_options: docs
                    type: string
                }
                SENDING_DOMAIN_PARAM: {
                    index: not_analyzed
                    omit_norms: true
                    index_options: docs
                    type: string
                }
                PRODUCT_ID: {
                    type: string
                }
                TEST_ID: {
                    type: string
                }
                MAILING_ID: {
                    type: string
                }
                NEWS_LETTER_ID: {
                    type: string
                }
                LINK_POSITION: {
                    type: integer
                }
                DECORATION_TIMESTAMP: {
                    type: string
                }
                SITE_ID: {
                    type: string
                }
                TEMPLATE_VERSION: {
                    type: string
                }
                ORIGINAL_LINK: {
                    index: not_analyzed
                    omit_norms: true
                    index_options: docs
                    type: string
                }
            }
        }
        ip: {
            index: not_analyzed
            omit_norms: true
            index_options: docs
            type: string
        }
    }
    _parent: {
        type: users
    }
}

我想要做的是搜索拥有N个孩子的所有父母,换句话说我想要获得所有有活动的用户记录(超过N次)在给定的时间段内(eventTimestamp

有人可以建议我可以阅读的资源或可以实现该资源的查询

更新 所以这就是我为此所做的(使用下面由Sloan Ahrens创建的索引和类型):

{
  "min_score": 2,
  "query": {
    "top_children": {
      "type": "order",
      "score": "sum",
      "query": {
        "constant_score": {
          "query": {
            "match_all": {}
          }
        }
      }
    }
  }
}

这将使我所有至少有3个订单的客户(感谢imotov)

1 个答案:

答案 0 :(得分:2)

嗯,这肯定不是一个完全令人满意的解决方案,因为它需要两个查询,但我认为你可以使用方面得到你想要的。

简化一点(并使用来自this blog post的架构/数据),我将首先创建一个具有父/子关系的简单索引:

curl -XPUT "http://localhost:9200/orders" -d'
{
    "mappings": { 
        "customer": {},
        "order" : {
            "_parent" : {
                "type" : "customer"
            }
        }
    }
}'

然后添加一些数据:

curl -XPOST "http://localhost:9200/orders/_bulk" -d'
{ "index" : { "_type" : "customer", "_id" : "john" } }
{ "name" : "John Doe" }
{ "index" : { "_type" : "order", "_parent" : "john" } }
{ "date" : "2013-10-15T12:00:00" }
{ "index" : { "_type" : "order", "_parent" : "john" } }
{ "date" : "2013-11-15T12:00:00" }
{ "index" : { "_type" : "order", "_parent" : "john" } }
{ "date" : "2013-12-01T12:00:00" }
{ "index" : { "_type" : "customer", "_id" : "jane" } }
{ "name" : "Jane Doe" }
{ "index" : { "_type" : "order", "_parent" : "jane" } }
{ "date" : "2013-11-20T12:00:00" }
{ "index" : { "_type" : "customer", "_id" : "bob" } }
{ "name" : "Bob Doe" }
{ "index" : { "_type" : "order", "_parent" : "bob" } }
{ "date" : "2013-09-20T12:00:00" }
'

然后我可以在order字段上面对"_parent",过滤date上面临的文档:

curl -XPOST "http://localhost:9200/orders/order/_search " -d'
{
    "size": 0, 
    "facets": {
       "customers": {
          "terms": {
              "field": "_parent"
          },
          "facet_filter": {
              "range": {
                    "date": {
                        "from": "2013-11-01T00:00:00"
                    }
                }
          }
       }
    }
}'

给了我以下回复:

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 5,
      "max_score": 1,
      "hits": []
   },
   "facets": {
      "customers": {
         "_type": "terms",
         "missing": 0,
         "total": 3,
         "other": 0,
         "terms": [
            {
               "term": "customer#john",
               "count": 2
            },
            {
               "term": "customer#jane",
               "count": 1
            }
         ]
      }
   }
}

然后,我可以使用返回的ID检索customer和第二个查询:

curl -XPOST "http://localhost:9200/orders/_search" -d'
{
   "query": {
      "ids": {
         "type": "customer",
         "values": [
            "john",
            "jane"
         ]
      }
   }
}'

您必须在最后两个步骤之间添加自己的逻辑,以根据结果计数确定要检索的客户,但您可以在此上下文中使用此方法。

以下是您可以使用的可运行示例:http://sense.qbox.io/gist/9ebde72ccffa0dce654383ad4fb0a8451b74a9f7