Question

我们对elasticsearch 1.7的映射存在问题。我正在通过使用正确的映射创建一个新索引来解决问题。我知道，因为我正在创建一个新索引，所以我必须将旧索引与现有数据重新索引到我刚刚创建的新索引。问题是我已经google了，无法找到从旧到新重新索引的方法。似乎在ES 2.3中引入了reindex API，而1.7不支持。

我的问题是如何在修复映射后将数据从旧重新索引到新索引。或者，在ES 1.7中进行映射更改的最佳实践是什么？

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html对我不起作用，因为我们使用旧版本的ES（1.7）
https://www.elastic.co/blog/changing-mapping-with-zero-downtime 最初沿着那条道路走下去却陷入困境，需要一种方法来重新索引旧的

Answer 1

您的用例较晚，但希望将其用于其他用户。这是有关如何使用Logstash 1.5版重新索引Elasticsearch索引同时保持原始数据完整性的优秀分步指南：http://david.pilato.fr/blog/2015/05/20/reindex-elasticsearch-with-logstash/

这是作者创建的logstash-simple.conf：

Input {
  # We read from the "old" cluster
  elasticsearch {
    hosts => [ "localhost" ]
    port => "9200"
    index => "index"
    size => 500
    scroll => "5m"
    docinfo => true
  }
}

filter {
  mutate {
    remove_field => [ "@timestamp", "@version" ]
  }
}

output {
  # We write to the "new" cluster
  elasticsearch {
    host => "localhost"
    port => "9200"
    protocol => "http"
    index => "new_index"
    index_type => "%{[@metadata][_type]}"
    document_id => "%{[@metadata][_id]}"
  }
  # We print dots to see it in action
  stdout {
    codec => "dots"
  }

Answer 2

有几种选择：

使用logstash - 在logstash中创建reindex配置非常容易，并使用它来重新索引文档。例如：

input {
  elasticsearch {
    hosts => [ "localhost" ]
    port => "9200"
    index => "index1"
    size => 1000
    scroll => "5m"
    docinfo => true
  }
}


output {
  elasticsearch {
    host => "localhost"
    port => "9200"
    protocol => "http"
    index => "index2"
    index_type => "%{[@metadata][_type]}"
    document_id => "%{[@metadata][_id]}"
  }
}

这种方法的问题是它会相对较慢，因为你只有一台机器可以重建索引过程。

另一个选项，请使用此tool。它将比logstash更快，但您必须为所有文档提供分段逻辑以加快处理速度。例如，如果您有一个数值字段，其值范围为1 - 100，那么您可以在工具中对查询进行分段，可能是10个间隔（1 - 10,11 - 20，... 91 - 100），所以该工具将产生10个索引器，这些索引器将并行重新索引旧索引。

在Elasticsearch 1.7中重新编制索引

2 个答案: