ElasticSearch中SELECT TOP N的等价物是什么?

时间:2014-03-07 21:57:30

标签: elasticsearch

假设以下架构:

{
  "document" : {
    "properties" : {
      "DocumentTitle" : {"type":"string", "index":"not_analyzed", "analyzer":"keyword", "store":true },
      "ReceptionDate" : {"type":"date", "format":"yyyy-MM-dd HH:mm", "store":true }
    }
  }
}

我想要做的是按接收日期(因此5个最近的文档)获取TOP 5文档但是我希望它们按另一个字段(DocumentTitle)排序,因此只需按日期排序并限制为5个结果不是够了。

这可能通过1次查询或多次查询吗?

更新(根据Sidharthan请求):

我来自RDMS世界,这是一个非常常见的问题,使用TOP或group by语句解决。因此,我预计这将是一个简单的是/否响应,无论ElasticSearch是否支持此类功能(TOP)。

我在下面创建了演示数据,以帮助您更好地理解我的问题:

PUT http://localhost:9200/custom/

POST http://localhost:9200/custom/document/_mapping
POST data:
{
    "document" : {
         "properties":{
             "DocumentTitle": { "type": "string", "store": true },
             "ReceptionDate": { "type": "date", "format" : "yyyy-MM-dd'T'HH:mmZ", "store": true }
        }
    }
}

POST http://localhost:9200/custom/document/
POST data:
{
    "DocumentTitle":"A.PDF",
    "ReceptionDate":"2001-01-01T00:00+0000"
}


POST http://localhost:9200/custom/document/
POST data:
{
    "DocumentTitle":"B.PDF",
    "ReceptionDate":"2002-01-01T00:00+0000"
}

POST http://localhost:9200/custom/document/
POST data:
{
    "DocumentTitle":"C.PDF",
    "ReceptionDate":"2003-01-01T00:00+0000"
}


POST http://localhost:9200/custom/document/
POST data:
{
    "DocumentTitle":"D.PDF",
    "ReceptionDate":"2004-01-01T00:00+0000"
}

POST http://localhost:9200/custom/document/
POST data:
{
    "DocumentTitle":"E.PDF",
    "ReceptionDate":"2005-01-01T00:00+0000"
}

POST http://localhost:9200/custom/document/
POST data:
{
    "DocumentTitle":"F.PDF",
    "ReceptionDate":"2006-01-01T00:00+0000"
}

POST http://localhost:9200/custom/document/
POST data:
{
    "DocumentTitle":"G.PDF",
    "ReceptionDate":"2006-01-01T00:00+0000"
}

Sidharthan提案的结果是(我在帖子中使用URI搜索较短的尺寸):

GET http://localhost:9200/custom/document/_search?q=DocumentTitle:*&sort=ReceptionDate:desc,DocumentTitle:asc&fields=ReceptionDate,DocumentTitle&size=5&pretty=true

- 回应 -

{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 7,
    "max_score" : null,
    "hits" : [ {
      "_index" : "custom",
      "_type" : "document",
      "_id" : "v6gLeB9kSOCc5OgoTLT6BA",
      "_score" : null, "_source" : {
    "DocumentTitle":"F.PDF",
    "ReceptionDate":"2006-01-01T00:00+0000"
},
      "sort" : [ 1136073600000, "f.pdf" ]
    }, {
      "_index" : "custom",
      "_type" : "document",
      "_id" : "DJGivLtOQsW6DAGA5wgQzA",
      "_score" : null, "_source" : {
    "DocumentTitle":"G.PDF",
    "ReceptionDate":"2006-01-01T00:00+0000"
},
      "sort" : [ 1136073600000, "g.pdf" ]
    }, {
      "_index" : "custom",
      "_type" : "document",
      "_id" : "ic3v37xGQtydrjb-RaJl4g",
      "_score" : null, "_source" : {
    "DocumentTitle":"E.PDF",
    "ReceptionDate":"2005-01-01T00:00+0000"
},
      "sort" : [ 1104537600000, "e.pdf" ]
    }, {
      "_index" : "custom",
      "_type" : "document",
      "_id" : "kCcgoiodQKuxsD9n6ZGifw",
      "_score" : null, "_source" : {
    "DocumentTitle":"D.PDF",
    "ReceptionDate":"2004-01-01T00:00+0000"
},
      "sort" : [ 1072915200000, "d.pdf" ]
    }, {
      "_index" : "custom",
      "_type" : "document",
      "_id" : "jUYP0d3pSmSjlMqw3TsS1Q",
      "_score" : null, "_source" : {
    "DocumentTitle":"C.PDF",
    "ReceptionDate":"2003-01-01T00:00+0000"
},
      "sort" : [ 1041379200000, "c.pdf" ]
    } ]
  }
}

根据所包含的数据,这是完全正确的结果集。 但是它处于错误的订单

我需要这些商品仅由DocumentTitle(C,D,E,F,G)订购

除非ES支持某种TOP,否则我认为唯一的解决方案是获取ReceptionDate排序的结果集,然后按照kielni的建议在客户端手动进行排序。

2 个答案:

答案 0 :(得分:0)

对这两个字段使用排序。首先按ReceptionDate排序,然后按DocumentTitle排序。 尝试

{
  "sort": [
    {
      "ReceptionDate": {
        "order": "desc"
      }
    },
    {
      "DocumentTitle": "asc"
    }
  ],
  "query": {
    "term": {
      "user": "kimchy"
    }
  },
  "size": 5
}

答案 1 :(得分:0)

目前ES似乎无法做到这一点。

我将继续获取结果集并在客户端中应用订单。

谢谢大家。