Question

假设以下stockInWarehouse架构：

{
  product_db: {
    mappings: {
      stockInWarehouse: {
        properties: {
          sku: {
            type: "string"
          },
          arrivalTime: {
            type: "date",
            format: "dateOptionalTime"
          }
        }
      }
    }
  }
}

stockInWarehouse中的数据如下所示：

{
  "hits": {
    "total": 5,
    "hits": [
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "1",
        "_source": {
          "sku": "item 1",
          "arrivalTime": "2015-11-11T19:00:10.231Z"
        }
      },
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "2",
        "_source": {
          "sku": "item 2",
          "arrivalTime": "2015-11-12T19:00:10.231Z"
        }
      },
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "3",
        "_source": {
          "sku": "item 1",
          "arrivalTime": "2015-11-12T19:35:10.231Z"
        }
      },
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "4",
        "_source": {
          "sku": "item 1",
          "arrivalTime": "2015-11-13T19:56:10.231Z"
        }
      },
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "5",
        "_source": {
          "sku": "item 3",
          "arrivalTime": "2015-11-15T19:56:10.231Z"
        }
      }
    ]
  }
}

我想要做的是通过到达时间获取TOP文档（也就是最近的文档）但是我希望它们按另一个字段（sku）排序并且限制可用的sku 。预期结果如下：

{
  "hits": {
    "total": 3,
    "hits": [
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "5",
        "_source": {
          "sku": "item 3",
          "arrivalTime": "2015-11-15T19:56:10.231Z"
        }
      },
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "4",
        "_source": {
          "sku": "item 1",
          "arrivalTime": "2015-11-13T19:56:10.231Z"
        }
      },
      {
        "_index": "product_db",
        "_type": "stockInWarehouse",
        "_id": "2",
        "_source": {
          "sku": "item 2",
          "arrivalTime": "2015-11-12T19:00:10.231Z"
        }
      }
    ]
  }
}

如果按arrivalTime排序，结果sku列表将包含item 3, item 1, item 1, item 2, item 1（重复）。如果按sku排序，结果列表将不会反映正确的到达时间顺序。

Elasticsearch中是否可以使用此类查询？我该如何存档？

Answer 1

这个怎么样？

{
  "size": 0,
  "aggs": {
    "terms_agg": {
      "terms": {
        "field": "sku",
        "size": 100,
        "order": {
          "max_date_agg": "desc"
        }

      },
      "aggs": {
        "max_date_agg": {
          "max": {
            "field": "arrivalTime"
          }
        }
      }
    }
  }
}

我假设您有很多产品，我已经size : 100。

注意您需要将index : not_analyzed添加到您的mapping sku 这是查询的结果

"aggregations": {
      "terms_agg": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "item 3",
               "doc_count": 1,
               "max_date_agg": {
                  "value": 1447617370231,
                  "value_as_string": "2015-11-15T19:56:10.231Z"
               }
            },
            {
               "key": "item 1",
               "doc_count": 3,
               "max_date_agg": {
                  "value": 1447444570231,
                  "value_as_string": "2015-11-13T19:56:10.231Z"
               }
            },
            {
               "key": "item 2",
               "doc_count": 1,
               "max_date_agg": {
                  "value": 1447354810231,
                  "value_as_string": "2015-11-12T19:00:10.231Z"
               }
            }
         ]
      }
   }

我希望它有所帮助!!

在Elasticsearch中选择TOP + GROUP BY + SHORT？

1 个答案: