Question

让我说我的ElasticSearch中有电影数据，我就像这样创建了它们：

curl -XPUT "http://192.168.0.2:9200/movies/movie/1" -d'
{
    "title": "The Godfather",
    "director": "Francis Ford Coppola",
    "year": 1972
}'

我有一堆不同年代的电影。我想复制特定年份的所有电影（所以，1972年），并将它们复制到新的索引＃70; 70sMovies＆＃34;，但我无法看到如何做到这一点。

Answer 1

从ElasticSearch 2.3开始，您现在可以使用内置的_reindex API

例如：

POST /_reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter"
  }
}

或仅通过添加过滤器/查询

来指定特定部分

POST /_reindex
{
  "source": {
    "index": "twitter",
    "query": {
      "term": {
        "user": "kimchy"
      }
    }
  },
  "dest": {
    "index": "new_twitter"
  }
}

了解详情：https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

Answer 2

最好的方法是使用elasticsearch-dump工具https://github.com/taskrabbit/elasticsearch-dump。

我使用过的真实世界的例子：

elasticdump \
  --input=http://localhost:9700/.kibana \
  --output=http://localhost:9700/.kibana_read_only \
  --type=mapping
elasticdump \
  --input=http://localhost:9700/.kibana \
  --output=http://localhost:9700/.kibana_read_only \
  --type=data

Answer 3

检查背包： https://github.com/jprante/elasticsearch-knapsack

安装并运行插件后，您可以通过查询导出部分索引。例如：

curl -XPOST 'localhost:9200/test/test/_export' -d '{
"query" : {
    "match" : {
        "myfield" : "myvalue"
    }
},
"fields" : [ "_parent", "_source" ]
}'

这将创建一个只包含查询结果的tarball，然后您可以将其导入另一个索引。

Answer 4

这样做的直接方法是使用您选择的API编写代码，查询“year”：1972，然后将该数据索引到新索引中。您可以使用Search api或Scan and Scroll API获取所有文档，然后逐个索引或使用Bulk Api：

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-search.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html

假设您不希望通过代码执行此操作，但正在寻找直接的方法，我建议使用Elasticsearch Snapshot and Restore。基本上，您可以拍摄现有索引的快照，将其恢复为新索引，然后使用“删除”命令删除1972年以外的所有文档。

快照和恢复

快照和恢复模块允许创建快照   单个索引或整个集群到远程存储库中。在   初始发布时只有共享文件系统存储库   支持，但现在正式提供一系列后端   支持的存储库插件。

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html

按查询API删除

按查询API删除允许从一个或多个文档中删除文档   索引和基于查询的一种或多种类型。查询也可以   使用简单的查询字符串作为参数，或使用   在请求正文中定义的查询DSL。

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

Answer 5

您可以通过elasticsearch-dump（https://github.com/taskrabbit/elasticsearch-dump）分三步轻松完成。在下面的例子中，我复制索引＆＃34; thor＆＃34;到＆＃34; thor2＆＃34;

elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=analyzer

elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=mapping

elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=data

Answer 6

要将特定的类型从源索引重新索引到目标索引 type 语法是

#mimic dictionary

"""
    match each word in a string to a list of words that will follow it

    example:
        "hi my name is chris, hi chris, hi how are you"
        {hi: ["my","chris","how"], my: ["name"],...}

"""

def mimic_dict(str):
    list1 = str.split()
    dict = {}

    index, end = 0,  len(list1) - 1
    while index < end:
        current, next = list1[index], list1[index + 1]
        if not current in dict:
            dict[current] = [next]
        else:
            dict[current].append(next)
        index += 1 
    return dict

#google    
def mimic_dict1(text):
  mimic_dict = {}
  words = text.split()
  prev = ''
  for word in words:
    if not prev in mimic_dict:
      mimic_dict[prev] = [word]
    else:
      mimic_dict[prev].append(word)
    prev = word
  return mimic_dict

Answer 7

自 v7.4 以来，_clone api 被引入并且可以轻松满足您的需求：（阅读相关先决条件和所涉及的监控）

POST /<index>/_clone/<target-index>

或者：

PUT /<index>/_clone/<target-index>

如何将一些ElasticSearch数据复制到新索引

7 个答案: