从Elasticsearch查询结果创建一个平面数组

时间:2015-02-15 18:57:10

标签: elasticsearch

我有一个包含以下文档的索引(简化):

{
    "user" : "j.johnson",
    "certifications" : [{
            "certification_date" : "2013-02-09T00:00:00+03:00",
            "previous_level" : "No Level",
            "obtained_level" : "Junior"
        }, {
            "certification_date" : "2014-05-26T00:00:00+03:00",
            "previous_level" : "Junior",
            "obtained_level" : "Middle"
        }
    ]
}

我想要一个包含所有用户通过的所有认证的平面列表,其中certification_date> 2014年1月1日。它应该是一个非常大的数组:

[{
        "certification_date" : "2014-09-08T00:00:00+03:00",
        "previous_level" : "No Level",
        "obtained_level" : "Junior"
    }, {
        "certification_date" : "2014-05-26T00:00:00+03:00",
        "previous_level" : "Junior",
        "obtained_level" : "Middle"
    }, {
        "certification_date" : "2015-01-26T00:00:00+03:00",
        "previous_level" : "Junior",
        "obtained_level" : "Middle"
    }
    ...
]

这似乎不是一项艰巨的任务,但我无法找到一种简单的方法。

1 个答案:

答案 0 :(得分:1)

我会以parent/child关系来实现,但您必须重新组织数据。我认为你不能用你当前的架构获得你想要的东西。

更具体地说,我设置了这样的索引,其中user为父级,certification为子级:

PUT /test_index
{
   "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0
   },
   "mappings": {
      "user": {
         "properties": {
            "user_name": { "type": "string" }
         }
      },
      "certification":{
          "_parent": { "type": "user" },
          "properties": {
              "certification_date": { "type": "date" },
              "previous_level": { "type": "string" },
              "obtained_level": { "type": "string" }
          }
      }
   }
}

添加了一些文档:

POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"user","_id":1}}
{"user_name":"j.johnson"}
{"index":{"_index":"test_index","_type":"certification","_parent":1}}
{"certification_date" : "2013-02-09T00:00:00+03:00","previous_level" : "No Level","obtained_level" : "Junior"}
{"index":{"_index":"test_index","_type":"certification","_parent":1}}
{"certification_date" : "2014-05-26T00:00:00+03:00","previous_level" : "Junior","obtained_level" : "Middle"}
{"index":{"_index":"test_index","_type":"user","_id":2}}
{ "user_name":"b.bronson"}
{"index":{"_index":"test_index","_type":"certification","_parent":2}}
{"certification_date" : "2013-09-05T00:00:00+03:00","previous_level" : "No Level","obtained_level" : "Junior"}
{"index":{"_index":"test_index","_type":"certification","_parent":2}}
{"certification_date" : "2014-07-20T00:00:00+03:00","previous_level" : "Junior","obtained_level" : "Middle"}

现在,我可以使用范围过滤器搜索certifications

POST /test_index/certification/_search
{
   "query": {
      "constant_score": {
         "filter": {
            "range": {
               "certification_date": {
                  "gte": "2014-01-01"
               }
            }
         }
      }
   }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "certification",
            "_id": "QGXHp7JZTeafWYzb_1FZiA",
            "_score": 1,
            "_source": {
               "certification_date": "2014-05-26T00:00:00+03:00",
               "previous_level": "Junior",
               "obtained_level": "Middle"
            }
         },
         {
            "_index": "test_index",
            "_type": "certification",
            "_id": "yvO2A9JaTieI5VHVRikDfg",
            "_score": 1,
            "_source": {
               "certification_date": "2014-07-20T00:00:00+03:00",
               "previous_level": "Junior",
               "obtained_level": "Middle"
            }
         }
      ]
   }
}

这种结构仍然不像你要求的那样完全平坦,但我认为这与ES一样接近让你得到。

以下是我使用的代码:

http://sense.qbox.io/gist/3c733ec75e6c0856fa2772cc8f67bd7c00aba637