ElasticSearch:检索字符串连接或部分数组

时间:2015-05-17 09:19:04

标签: arrays elasticsearch

我有很多索引文档,例如:

{
   "_index":"myindex",
   "_type":"somedata",
   "_id":"31d3255d-67b4-40e6-b9d4-637383eb72ad",
   "_version":1,
   "_score":1,
   "_source":{
      "otherID":"b4c95332-daed-49ae-99fe-c32482696d1c",
      "data":[
         {
            "data":"d2454d41-a74e-43af-b3b0-0febeaf67a99",
            "iD":"9362f2eb-9bd7-4924-8b0e-77c27bb0aa56"
         },
         {
            "data":"some text",
            "iD":"c554b8ce-c873-4fef-b306-ec65d2f40394"
         },
         {
            "data":"5256983c-ef69-4363-9787-97074297c646",
            "iD":"8c90e2be-6042-4450-b0fd-0732900f8f65"
         },
         {
            "data":"other text",
            "iD":"8d8f8a61-02d6-4d3e-9912-9ebb5d213c15"
         },
         {
            "data":"3",
            "iD":"c880bfdf-eb4b-4c80-9871-fd44e06b2ed2"
         }
      ],
      "iD":"31d3255d-67b4-40e6-b9d4-637383eb72ad"
   }
}  

它的类型映射是这样配置的:

{
   "somedata":{
      "dynamic_templates":[
         {
            "defaultIDs":{
               "match_pattern":"regex",
               "mapping":{
                  "index":"not_analyzed",
                  "type":"string"
               },
               "match":".*(id|ID|iD)"
            }
         }
      ],
      "properties":{
         "otherID":{
            "index":"not_analyzed",
            "type":"string"
         },
         "data":{
            "properties":{
               "data":{
                  "type":"string"
               },
               "iD":{
                  "index":"not_analyzed",
                  "type":"string"
               }
            }
         },
         "iD":{
            "index":"not_analyzed",
            "type":"string"
         }
      }
   }
}  

我希望能够根据它的ID检索数据的字符串连接 例如,如果标识为c554b8ce-c873-4fef-b306-ec65d2f40394,标识为8d8f8a61-02d6-4d3e-9912-9ebb5d213c15,我想检索some text other text
这些ID在具有不同数据的相同类型的其他文档中重复。

如果这不可能(我怀疑是这种情况),我想至少检索一个包含我所请求数据的部分数组。
那些数组可能会变大(文档数量也会很大),每次命中只需要一两个元素。

如果我的请求都不可能,您会如何建议更改我的映射以满足我的需求?

先谢谢,乔纳森。

2 个答案:

答案 0 :(得分:3)

我找到了一种方法,可以在不改变数据结构的情况下完成所需的完全 (实际上我最终改变了我的数据结构,但出于空间和效率的原因)。

您需要做的就是享受ElasticSearch提供的 groovy goodness

{
    "query" : { "term" : { "otherID" : "b4c95332-daed-49ae-99fe-c32482696d1c" } },
    "script_fields" : { "requestedFields" : { "script" :  "_source.data.findAll({ it.iD == 'c554b8ce-c873-4fef-b306-ec65d2f40394' || it.iD == '8d8f8a61-02d6-4d3e-9912-9ebb5d213c15'}) data.join(' ') " } }
}

只是展示ElasticSearch的真实性。

答案 1 :(得分:1)

我无法帮助你进行字段连接(也许它可以用脚本编写,但我没有足够的经验。我会假设一个新字段必须是生成等)但如何只检索部分数据。

它至少需要ES 1.5,因为它使用inner_hits,您需要更改映射。

我已将typeinclude_in_parent添加到您的data类型:

DELETE somedata
PUT somedata
PUT somedata/sometype/_mapping
{
   "sometype":{
      "dynamic_templates":[
         {
            "defaultIDs":{
               "match_pattern":"regex",
               "mapping":{
                  "index":"not_analyzed",
                  "type":"string"
               },
               "match":".*(id|ID|iD)"
            }
         }
      ],
      "properties":{
         "otherID":{
            "index":"not_analyzed",
            "type":"string"
         },
         "data":{
            "type": "nested",
            "include_in_parent": true,
            "properties":{
               "data":{
                  "type":"string"
               },
               "iD":{
                  "index":"not_analyzed",
                  "type":"string"
               }
            }
         },
         "iD":{
            "index":"not_analyzed",
            "type":"string"
         }
      }
   }
}  

现在索引您的文档:

PUT somedata/sometype/1
{
      "otherID":"b4c95332-daed-49ae-99fe-c32482696d1c",
      "data":[
         {
            "data":"d2454d41-a74e-43af-b3b0-0febeaf67a99",
            "iD":"9362f2eb-9bd7-4924-8b0e-77c27bb0aa56"
         },
         {
            "data":"some text",
            "iD":"c554b8ce-c873-4fef-b306-ec65d2f40394"
         },
         {
            "data":"5256983c-ef69-4363-9787-97074297c646",
            "iD":"8c90e2be-6042-4450-b0fd-0732900f8f65"
         },
         {
            "data":"other text",
            "iD":"8d8f8a61-02d6-4d3e-9912-9ebb5d213c15"
         },
         {
            "data":"3",
            "iD":"c880bfdf-eb4b-4c80-9871-fd44e06b2ed2"
         }
      ],
      "iD":"31d3255d-67b4-40e6-b9d4-637383eb72ad"
   }

以下是inner_hits

匹配和检索的方法
POST somedata/sometype/_search
{
  "query": {
    "nested": {
      "path": "data",
      "query": {
        "bool": {
          "should": [
            {
            "term": {
              "data.iD": "c554b8ce-c873-4fef-b306-ec65d2f40394"
            }
            },
            {
            "term": {
              "data.iD": "8d8f8a61-02d6-4d3e-9912-9ebb5d213c15"
            }
            }
          ]
        }
      },
      "inner_hits": {}
    }
  }
}

现在在结果中查看此路径hits.hits[0].inner_hits.data.hits.hits[0]._source.data;它只包含您要求的两个匹配项:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.5986179,
      "hits": [
         {
            "_index": "somedata",
            "_type": "sometype",
            "_id": "1",
            "_score": 0.5986179,
            "_source": {
               "otherID": "b4c95332-daed-49ae-99fe-c32482696d1c",
               "data": [
                  {
                     "data": "d2454d41-a74e-43af-b3b0-0febeaf67a99",
                     "iD": "9362f2eb-9bd7-4924-8b0e-77c27bb0aa56"
                  },
                  {
                     "data": "some text",
                     "iD": "c554b8ce-c873-4fef-b306-ec65d2f40394"
                  },
                  {
                     "data": "5256983c-ef69-4363-9787-97074297c646",
                     "iD": "8c90e2be-6042-4450-b0fd-0732900f8f65"
                  },
                  {
                     "data": "other text",
                     "iD": "8d8f8a61-02d6-4d3e-9912-9ebb5d213c15"
                  },
                  {
                     "data": "3",
                     "iD": "c880bfdf-eb4b-4c80-9871-fd44e06b2ed2"
                  }
               ],
               "iD": "31d3255d-67b4-40e6-b9d4-637383eb72ad"
            },
            "inner_hits": {
               "data": {
                  "hits": {
                     "total": 2,
                     "max_score": 0.5986179,
                     "hits": [
                        {
                           "_index": "somedata",
                           "_type": "sometype",
                           "_id": "1",
                           "_nested": {
                              "field": "data",
                              "offset": 3
                           },
                           "_score": 0.5986179,
                           "_source": {
                              "data": "other text",
                              "iD": "8d8f8a61-02d6-4d3e-9912-9ebb5d213c15"
                           }
                        },
                        {
                           "_index": "somedata",
                           "_type": "sometype",
                           "_id": "1",
                           "_nested": {
                              "field": "data",
                              "offset": 1
                           },
                           "_score": 0.5986179,
                           "_source": {
                              "data": "some text",
                              "iD": "c554b8ce-c873-4fef-b306-ec65d2f40394"
                           }
                        }
                     ]
                  }
               }
            }
         }
      ]
   }
}

现在,inner_hits相当新,文档还说明了:

  

警告:此功能属于实验性功能,可能会在以后的版本中完全更改或删除。

因人而异。

需要注意的另一件事: inner_hits按分数排序。在原始文档中,它们位于 订购的数组中,但该信息在实际结果中丢失。如果您需要在inner_hits中使用相同的顺序,我认为您需要添加一个单独的字段进行排序(可能只是数组索引...)并按其排序inner_hits