Easticsearch重新索引多类型父/子索引(v5.0)以加入类型索引(v6.2)

时间:2018-03-19 14:13:16

标签: elasticsearch elasticsearch-5 elasticsearch-painless

我将索引数据从ES 5.0(父子)重新索引到ES 6.2(加入类型)

索引ES 5.0中的数据作为父子文档存储在不同的类型中,对于reindex我已经在我的新集群中创建了基于6.2的新索引/映射。

父文档完美地重新索引到新索引,但子文档抛出错误如下

{
  "index": "index_two",
  "type": "_doc",
  "id": "AVpisCkMuwDYFnQZiFXl",
  "cause": {
    "type": "mapper_parsing_exception",
    "reason": "failed to parse",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "[routing] is missing for join field [field_relationship]"
    }
  },
  "status": 400
}

用于重新索引数据的脚本

  {
  "source": {
    "remote": {
      "host": "http://myescluster.com:9200",
      "socket_timeout": "1m",
      "connect_timeout": "20s"
    },
    "index": "index_two",
    "type": ["actions"],
    "size": 5000,
    "query":{
        "bool":{
            "must":[
                {"term": {"client_id.raw": "cl14ous0ydao"}}
            ]
        }
    }
  },
  "dest": {
    "index": "index_two",
    "type": "_doc"
  },
  "script": {
    "params": {
        "jdata": {
            "name": "actions"
        }
    },
    "source": "ctx._routing=ctx._routing;ctx.remove('_parent');params.jdata.parent=ctx._source.user_id;ctx._source.field_relationship=params.jdata"
  }
}

我已经在无痛脚本中传递了路由字段,因为文档是源索引的动态。

目的地索引的映射

{
  "index_two": {
    "mappings": {
      "_doc": {
        "dynamic_templates": [
          {
            "template_actions": {
              "match_mapping_type": "string",
              "mapping": {
                "fields": {
                  "raw": {
                    "index": true,
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                },
                "type": "text"
              }
            }
          }
        ],
        "date_detection": false,
        "properties": {
          "attributes": {
            "type": "nested"
          }
        },
        "cl_other_params": {
          "type": "nested"
        },
        "cl_triggered_ts": {
          "type": "date"
        },
        "cl_utm_params": {
          "type": "nested"
        },
        "end_ts": {
          "type": "date"
        },
        "field_relationship": {
          "type": "join",
          "eager_global_ordinals": true,
          "relations": {
            "users": [
              "actions",
              "segments"
            ]
          }
        },
        "ip_address": {
          "type": "ip"
        },
        "location": {
          "type": "geo_point"
        },
        "processed_ts": {
          "type": "date"
        },
        "processing_time": {
          "type": "date"
        },
        "products": {
          "type": "nested",
          "properties": {
            "traits": {
              "type": "nested"
            }
          }
        },
        "segment_id": {
          "type": "integer"
        },
        "start_ts": {
          "type": "date"
        }
      }
    }
  }
}

我的示例源文档

    {
    "_index": "index_two",
    "_type": "actions",
    "_id": "AVvKUYcceQCc2OyLKWZ9",
    "_score": 7.4023576,
    "_routing": "cl14ous0ydaob71ab2a1-837c-4904-a755-11e13410fb94",
    "_parent": "cl14ous0ydaob71ab2a1-837c-4904-a755-11e13410fb94",
    "_source": {
      "user_id": "cl14ous0ydaob71ab2a1-837c-4904-a755-11e13410fb94",
      "client_id": "cl14ous0ydao",
      "session_id": "CL-e0ec3941-6dad-4d2d-bc9b",
      "source": "betalist",
      "action": "pageview",
      "action_type": "pageview",
      "device": "Desktop",
      "ip_address": "49.35.14.224",
      "location": "20.7333 , 77",
      "attributes": [
        {
          "key": "url",
          "value": "https://www.google.com/",
          "type": "string"
        }
      ],
      "products": []
    }
  }

2 个答案:

答案 0 :(得分:0)

我遇到了同样的问题,并且在弹性搜索讨论中进行搜索,发现this有效:

POST _reindex

{
    "source": {
        "index": "old_index",
        "type": "actions"
    },
    "dest": {
        "index": "index_two"
    },
    "script": {
        "source": """

            ctx._type = "_doc";

            String  routingCode = ctx._source.user_id;
            Map join = new HashMap();
            join.put('name', 'actions');
            join.put('parent', routingCode);

            ctx._source.put('field_relationship', join);

            ctx._parent = null;

            ctx._routing = new StringBuffer(routingCode)"""
    }
}

希望这会有所帮助:)。

答案 1 :(得分:0)

我想指出的是,连接字段通常不需要路由,但是如果在创建父级之前创建子级,那么您将面临这个问题。

建议先重新索引所有父母,然后再重新索引孩子。