按照输入数组的顺序从elasticsearch中检索信息

时间:2014-01-15 18:19:21

标签: php arrays elasticsearch

似乎找不到我怀疑的答案,所以我决定发布问题,看看是否有人可以帮助我。

在我的应用程序中,我有一个来自后端的ID数组,并且已根据需要进行了排序,例如: [0] => 23,[1] => 12,[2] => 45,[3] => 21

然后我使用术语过滤器“询问”elasticsearch与此数组中存在的每个id相对应的信息。问题是结果不是我发送的ID的顺序,所以结果混淆了,如:[0] => 21,[1] => 45,[2] => 23,[3] => 12

请注意,我无法通过在后端对数组进行排序的排序来对elasticsearch进行排序。

我也无法在php中对它们进行排序,因为我正在从elasticsearch中检索分页结果,因此如果每个oage有2个结果,elasticsearch可以仅为[0] =>提供信息。 21,[1] => 45,所以我甚至不能用php命令它们。

如何获得输入数组排序的结果?有什么想法吗?

提前致谢

1 个答案:

答案 0 :(得分:3)

通过自定义脚本评分,您可以通过以下方式实现这一目标。

首先我创建了一些虚拟数据:

curl -XPUT "http://localhost:9200/test_index"

curl -XPOST "http://localhost:9200/test_index/_bulk " -d'
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : 1 } }
{ "name" : "Document 1", "id" : 1 }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : 2 } }
{ "name" : "Document 2", "id" : 2 }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : 3 } }
{ "name" : "Document 3", "id" : 3 }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : 4 } }
{ "name" : "Document 4", "id" : 4 }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : 5 } }
{ "name" : "Document 5", "id" : 5 }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : 6 } }
{ "name" : "Document 6", "id" : 6 }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : 7 } }
{ "name" : "Document 7", "id" : 7 }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : 8 } }
{ "name" : "Document 8", "id" : 8 }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : 9 } }
{ "name" : "Document 9", "id" : 9 }
{ "index" : { "_index" : "test_index", "_type" : "docs", "_id" : 10 } }
{ "name" : "Document 10", "id" : 10 }
'

我使用"id"字段,即使它是多余的,因为"_id"字段会转换为字符串,并且使用整数更容易编写脚本。

您可以使用ids过滤器按ID返回一组特定的文档:

curl -XPOST "http://localhost:9200/test_index/_search" -d'
{
   "filter": {
      "ids": {
         "type": "docs",
         "values": [ 1, 8, 2, 5 ]
      }
   }
}'

但这些不一定按照您想要的顺序排列。使用script based scoring,您可以根据文档ID定义自己的排序。

这里我传入一个参数,该参数是将ID与得分相关联的对象列表。评分脚本只是循环遍历它们,直到它找到当前文档ID并返回该文档的预定分数(如果未列出,则返回0)。

curl -XPOST "http://localhost:9200/test_index/_search" -d'
{
   "filter": {
      "ids": {
         "type": "docs",
         "values": [ 1, 8, 2, 5 ]
      }
   },
   "sort" : {
        "_script" : {
            "script" : "for(i:scoring) { if(doc[\"id\"].value == i.id) return i.score; } return 0;",
            "type" : "number",
            "params" : {
                "scoring" : [
                    { "id": 1, "score": 1 },
                    { "id": 8, "score": 2 },
                    { "id": 2, "score": 3 },
                    { "id": 5, "score": 4 }
                ]
            },
            "order" : "asc"
        }
    }
}'

并按正确的顺序返回文件:

{
   "took": 11,
   "timed_out": false,
   "_shards": {
      "total": 2,
      "successful": 2,
      "failed": 0
   },
   "hits": {
      "total": 4,
      "max_score": null,
      "hits": [
         {
            "_index": "test_index",
            "_type": "docs",
            "_id": "1",
            "_score": null,
            "_source": {
               "name": "Document 1",
               "id": 1
            },
            "sort": [
               1
            ]
         },
         {
            "_index": "test_index",
            "_type": "docs",
            "_id": "8",
            "_score": null,
            "_source": {
               "name": "Document 8",
               "id": 8
            },
            "sort": [
               2
            ]
         },
         {
            "_index": "test_index",
            "_type": "docs",
            "_id": "2",
            "_score": null,
            "_source": {
               "name": "Document 2",
               "id": 2
            },
            "sort": [
               3
            ]
         },
         {
            "_index": "test_index",
            "_type": "docs",
            "_id": "5",
            "_score": null,
            "_source": {
               "name": "Document 5",
               "id": 5
            },
            "sort": [
               4
            ]
         }
      ]
   }
}

这是一个可运行的示例:http://sense.qbox.io/gist/01b28e5c038c785f0844abb7c01a71d69a32a2f4