如何使用elasticsearch查询此数据

时间:2015-03-11 01:20:13

标签: elasticsearch lucene

我正在努力寻找每个家庭中最年长的男性。结果中的每个人必须至少18岁。

以下是数据: 数据为csv

id     FamilyId     LastName         FirstName         Age     Gender
1      1            Smith            John              20      M
2      1            Smith            Joan              20      F
3.     1            Smith            Harry             1       M
4      2            Ross             Pie               33      F
5      2            Ross             Norman            30      M
6      2            Ross             Devan             13      M
7      2            Ross             Debra             9       F
8      2            Ross             Terry             9       F
9      3            Johnson          Mary              25      F
10     4            King             Bob               5       M

数据为json

[
  {
    "id":1,
    "FamilyId":1,
    "LastName":"Smith",
    "FirstName":"John",
    "Age":20,
    "Gender":"M"
  },
  {
    "id":2,
    "FamilyId":1,
    "LastName":"Smith",
    "FirstName":"Joan",
    "Age":20,
    "Gender":"F"
  },
  {
    "id":3,
    "FamilyId":1,
    "LastName":"Smith",
    "FirstName":"Harry",
    "Age":1,
    "Gender":"M"
  },
  {
    "id":4,
    "FamilyId":2,
    "LastName":"Ross",
    "FirstName":"Pie",
    "Age":33,
    "Gender":"F"
  },
  {
    "id":5,
    "FamilyId":2,
    "LastName":"Ross",
    "FirstName":"Norman",
    "Age":30,
    "Gender":"M"
  },
  {
    "id":6,
    "FamilyId":2,
    "LastName":"Ross",
    "FirstName":"Devan",
    "Age":13,
    "Gender":"M"
  },
  {
    "id":7,
    "FamilyId":2,
    "LastName":"Ross",
    "FirstName":"Debra",
    "Age":9,
    "Gender":"F"
  },
  {
    "id":8,
    "FamilyId":2,
    "LastName":"Ross",
    "FirstName":"Terry",
    "Age":9,
    "Gender":"F"
  },
  {
    "id":9,
    "FamilyId":3,
    "LastName":"Johnson",
    "FirstName":"Mary",
    "Age":25,
    "Gender":"F"
  },
  {
    "id":10,
    "FamilyId":4,
    "LastName":"King",
    "FirstName":"Bob",
    "Age":5,
    "Gender":"M"
  }
]

以下是我期待的数据:

id     FamilyId     LastName         FirstName         Age     Gender
1      1            Smith            John              20      M
5      2            Ross             Norman            30      M

数据为json

[
  {
    "id":1,
    "FamilyId":1,
    "LastName":"Smith",
    "FirstName":"John",
    "Age":20,
    "Gender":"M"
  },
  {
    "id":5,
    "FamilyId":2,
    "LastName":"Ross",
    "FirstName":"Norman",
    "Age":30,
    "Gender":"M"
  }
]

如果结果太难获得,我不需要结果中的id字段。使用elasticsearch可以进行这样的查询吗?

1 个答案:

答案 0 :(得分:2)

这似乎是这样做的; filter aggtop hits agg的组合(必须喜欢新的品牌,是吧?):

POST /test_index/_search?search_type=count
{
   "aggs": {
      "males_18_and_over": {
         "filter": {
            "and": [
               { "term": { "Gender": "M" } },
               { "range": { "Age": { "gte": 18 } } } 
            ]
         },
         "aggs": {
            "last_names": {
               "terms": {
                  "field": "LastName"
               },
               "aggs": {
                  "max_age": {
                     "top_hits": {
                        "sort": [
                           {
                              "Age": {
                                 "order": "desc"
                              }
                           }
                        ],
                        "size": 1
                     }
                  }
               }
            }
         }
      }
   }
}

返回:

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 10,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "males_18_and_over": {
         "doc_count": 2,
         "last_names": {
            "buckets": [
               {
                  "key": "Ross",
                  "doc_count": 1,
                  "max_age": {
                     "hits": {
                        "total": 1,
                        "max_score": null,
                        "hits": [
                           {
                              "_index": "test_index",
                              "_type": "doc",
                              "_id": "5",
                              "_score": null,
                              "_source": {
                                 "id": 5,
                                 "FamilyId": 2,
                                 "LastName": "Ross",
                                 "FirstName": "Norman",
                                 "Age": 30,
                                 "Gender": "M"
                              },
                              "sort": [
                                 30
                              ]
                           }
                        ]
                     }
                  }
               },
               {
                  "key": "Smith",
                  "doc_count": 1,
                  "max_age": {
                     "hits": {
                        "total": 1,
                        "max_score": null,
                        "hits": [
                           {
                              "_index": "test_index",
                              "_type": "doc",
                              "_id": "1",
                              "_score": null,
                              "_source": {
                                 "id": 1,
                                 "FamilyId": 1,
                                 "LastName": "Smith",
                                 "FirstName": "John",
                                 "Age": 20,
                                 "Gender": "M"
                              },
                              "sort": [
                                 20
                              ]
                           }
                        ]
                     }
                  }
               }
            ]
         }
      }
   }
}

这是我用来设置它的代码:

http://sense.qbox.io/gist/04742b9a9ce5b2b25a3829f0ffc719992ef20ad3