如何从弹性搜索中获取max _id

时间:2015-03-02 14:04:55

标签: elasticsearch

我创建了一条河流,每小时运行一次以从DB获取数据(使用jdbc河流插件)。

select * from orders

而不是选择所有记录我想要选择基于主键附加的数据。查询将是:

select * from orders where deviceid > '(Max Id in Elastic search)'

如何从弹性搜索中获得max _id?

1 个答案:

答案 0 :(得分:1)

使用"_id"字段似乎无法直接执行此操作,因为ES坚持将"_id"值转换为字符串。但是有办法解决它。

首先,我设置了一个带有几个文档的简单索引,如下所示:

PUT /test_index
{
   "settings": {
      "number_of_shards": 1
   }
}

POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"title":"first doc"}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"title":"second doc"}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"title":"third doc"}

然后我尝试使用max aggregation,但收到错误,因为"_id"是字符串:

POST /test_index/_search?search_type=count
{
   "aggs": {
      "max_id": {
         "max": {
            "field": "_id"
         }
      }
   }
}
...
{
   "error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[bQS7TqO9SfKSPQZYVXQBag][test_index][0]: ClassCastException[org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData cannot be cast to org.elasticsearch.index.fielddata.IndexNumericFieldData]}]",
   "status": 500
}

所以这不起作用。但是使用"_id" field中的"path"参数进行了轻微修改。

所以我将索引重新定义为

DELETE /test_index

PUT /test_index
{
   "settings": {
      "number_of_shards": 1
   },
   "mappings": {
      "doc": {
         "_id": {
            "path": "doc_id"
         }
      }
   }
}

然后使用"doc_id"路径索引文档:

POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"first doc","doc_id":1}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"second doc","doc_id":2}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"third doc","doc_id":3}

现在如果我搜索,我可以看到"_id"仍然是一个字符串,但"doc_id"是一个整数:

POST /test_index/_search
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 1,
            "_source": {
               "title": "first doc",
               "doc_id": 1
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 1,
            "_source": {
               "title": "second doc",
               "doc_id": 2
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "3",
            "_score": 1,
            "_source": {
               "title": "third doc",
               "doc_id": 3
            }
         }
      ]
   }
}

所以现在我可以轻松地使用max聚合来查找最大id值:

POST /test_index/_search?search_type=count
{
   "aggs": {
      "max_id": {
         "max": {
            "field": "doc_id"
         }
      }
   }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "max_id": {
         "value": 3
      }
   }
}