在Elasticsearch中选择TOP + GROUP BY + SHORT?

时间:2015-11-17 18:58:50

标签: elasticsearch

假设以下stockInWarehouse架构:

{
  product_db: {
    mappings: {
      stockInWarehouse: {
        properties: {
          sku: {
            type: "string"
          },
          arrivalTime: {
            type: "date",
            format: "dateOptionalTime"
          }
        }
      }
    }
  }
}

stockInWarehouse中的数据如下所示:

{
  "hits": {
    "total": 5,
    "hits": [
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "1",
        "_source": {
          "sku": "item 1",
          "arrivalTime": "2015-11-11T19:00:10.231Z"
        }
      },
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "2",
        "_source": {
          "sku": "item 2",
          "arrivalTime": "2015-11-12T19:00:10.231Z"
        }
      },
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "3",
        "_source": {
          "sku": "item 1",
          "arrivalTime": "2015-11-12T19:35:10.231Z"
        }
      },
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "4",
        "_source": {
          "sku": "item 1",
          "arrivalTime": "2015-11-13T19:56:10.231Z"
        }
      },
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "5",
        "_source": {
          "sku": "item 3",
          "arrivalTime": "2015-11-15T19:56:10.231Z"
        }
      }
    ]
  }
}

我想要做的是通过到达时间获取TOP文档(也就是最近的文档)但是我希望它们按另一个字段(sku)排序并且限制可用的sku 。预期结果如下:

{
  "hits": {
    "total": 3,
    "hits": [
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "5",
        "_source": {
          "sku": "item 3",
          "arrivalTime": "2015-11-15T19:56:10.231Z"
        }
      },
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "4",
        "_source": {
          "sku": "item 1",
          "arrivalTime": "2015-11-13T19:56:10.231Z"
        }
      },
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "2",
        "_source": {
          "sku": "item 2",
          "arrivalTime": "2015-11-12T19:00:10.231Z"
        }
      }
    ]
  }
}

如果按arrivalTime排序,结果sku列表将包含item 3, item 1, item 1, item 2, item 1(重复)。如果按sku排序,结果列表将不会反映正确的到达时间顺序。

Elasticsearch中是否可以使用此类查询?我该如何存档?

1 个答案:

答案 0 :(得分:1)

这个怎么样?

{
  "size": 0,
  "aggs": {
    "terms_agg": {
      "terms": {
        "field": "sku",
        "size": 100,
        "order": {
          "max_date_agg": "desc"
        }

      },
      "aggs": {
        "max_date_agg": {
          "max": {
            "field": "arrivalTime"
          }
        }
      }
    }
  }
}

我假设您有很多产品,我已经size : 100

注意您需要将index : not_analyzed添加到您的mapping sku 这是查询的结果

"aggregations": {
      "terms_agg": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "item 3",
               "doc_count": 1,
               "max_date_agg": {
                  "value": 1447617370231,
                  "value_as_string": "2015-11-15T19:56:10.231Z"
               }
            },
            {
               "key": "item 1",
               "doc_count": 3,
               "max_date_agg": {
                  "value": 1447444570231,
                  "value_as_string": "2015-11-13T19:56:10.231Z"
               }
            },
            {
               "key": "item 2",
               "doc_count": 1,
               "max_date_agg": {
                  "value": 1447354810231,
                  "value_as_string": "2015-11-12T19:00:10.231Z"
               }
            }
         ]
      }
   }

我希望它有所帮助!!