基于数组中满足过滤条件的元素的Elasticsearch排序

时间:2018-12-04 07:57:14

标签: elasticsearch elasticsearch-2.0

我的类型的字段为ISO 8601格式的时间数组。我想获取所有在某一天有时间的列表,然后在它们发生在该特定日期的最早时间之前对其进行排序。问题是我的查询是根据所有天的最早时间进行排序的。

您可以在下面重现该问题。

curl -XPUT 'localhost:9200/listings?pretty'

curl -XPOST 'localhost:9200/listings/listing/_bulk?pretty' -d '
{"index": { } }
{ "name": "second on 6th (3rd on the 5th)", "times": ["2018-12-05T12:00:00","2018-12-06T11:00:00"] }
{"index": { } }
{ "name": "third on 6th (1st on the 5th)", "times": ["2018-12-05T10:00:00","2018-12-06T12:00:00"] }
{"index": { } }
{ "name": "first on the 6th (2nd on the 5th)", "times": ["2018-12-05T11:00:00","2018-12-06T10:00:00"] }
'

# because ES takes time to add them to index 
sleep 2

echo "Query listings on the 6th!"

curl -XPOST 'localhost:9200/listings/_search?pretty' -d '
{
  "sort": {
    "times": {
      "order": "asc",
      "nested_filter": {
        "range": {
          "times": {
            "gte": "2018-12-06T00:00:00",
            "lte": "2018-12-06T23:59:59"
          }
        }
      }
    }
  },
  "query": {
    "bool": {
      "filter": {
        "range": {
          "times": {
            "gte": "2018-12-06T00:00:00",
            "lte": "2018-12-06T23:59:59"
          }
        }
      }
    }
  }
}'

curl -XDELETE 'localhost:9200/listings?pretty'

将上述脚本添加到.sh文件并运行它有助于重现该问题。您会看到订单是根据5号而不是6号进行的。 Elasticsearch将时间转换为epoch_millis数以进行排序,您可以在hits对象的sort字段中看到纪元数,例如1544007600000。进行asc排序时,in采用数组中的最小数(顺序不重要) ),并以此为基础进行排序。

以某种方式,我需要在查询日(即6日)的最早时间订购此商品。

当前使用Elasticsearch 2.4,但是即使有人可以向我展示在当前版本中是如何完成的,这也很棒。

如果有帮助,这是他们在nested queriesscripting上的文件。

1 个答案:

答案 0 :(得分:3)

我认为这里的问题是嵌套排序是针对嵌套对象而不是数组。

如果将文档转换为使用一组嵌套对象而不是简单的日期数组的文档,则可以构造一个有效的嵌套过滤排序。

以下是Elasticsearch 6.0-从6.1开始,它们对语法进行了一些更改,但我不确定在2.x中有多大的作用:

映射:

PUT nested-listings
{
  "mappings": {
    "listing": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "openTimes": {
          "type": "nested",
          "properties": {
            "date": {
              "type": "date"
            }
          }
        }
      }
    }
  }
}

数据:

POST nested-listings/listing/_bulk
{"index": { } }
{ "name": "second on 6th (3rd on the 5th)", "openTimes": [ { "date": "2018-12-05T12:00:00" }, { "date": "2018-12-06T11:00:00" }] }
{"index": { } }
{ "name": "third on 6th (1st on the 5th)", "openTimes": [ {"date": "2018-12-05T10:00:00"}, { "date": "2018-12-06T12:00:00" }] }
{"index": { } }
{ "name": "first on the 6th (2nd on the 5th)", "openTimes": [ {"date": "2018-12-05T11:00:00" }, { "date": "2018-12-06T10:00:00" }] }

因此,我们有一个“ openTimes”嵌套对象,而不是“ nextNexpectionOpenTimes”,每个清单都包含一个openTimes数组。

现在搜索:

POST nested-listings/_search
{
  "sort": {
    "openTimes.date": {
      "order": "asc",
      "nested_path": "openTimes",
      "nested_filter": {
        "range": {
          "openTimes.date": {
            "gte": "2018-12-06T00:00:00",
            "lte": "2018-12-06T23:59:59"
          }
        }
      }
    }
  },
  "query": {
    "nested": {
      "path": "openTimes",
      "query": {
        "bool": {
          "filter": {
            "range": {
              "openTimes.date": {
                "gte": "2018-12-06T00:00:00",
                "lte": "2018-12-06T23:59:59"
              }
            }
          }
        }
      }
    }
  }
}

这里的主要区别是查询稍有不同,因为您需要使用“嵌套”查询对嵌套对象进行过滤。

这将产生以下结果:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "nested-listings",
        "_type": "listing",
        "_id": "vHH6e2cB28sphqox2Dcm",
        "_score": null,
        "_source": {
          "name": "first on the 6th (2nd on the 5th)"
        },
        "sort": [
          1544090400000
        ]
      },
      {
        "_index": "nested-listings",
        "_type": "listing",
        "_id": "unH6e2cB28sphqox2Dcm",
        "_score": null,
        "_source": {
          "name": "second on 6th (3rd on the 5th)"
        },
        "sort": [
          1544094000000
        ]
      },
      {
        "_index": "nested-listings",
        "_type": "listing",
        "_id": "u3H6e2cB28sphqox2Dcm",
        "_score": null,
        "_source": {
          "name": "third on 6th (1st on the 5th)"
        },
        "sort": [
          1544097600000
        ]
      }
    ]
  }
}

我认为您实际上不能从ES中的数组中选择一个值,因此对于排序,您总是要对所有结果进行排序。对于纯数组,您可以做的最好的事情就是选择如何处理该数组以进行排序(使用最低,最高,均值等)。