Question

我正在使用此代码使用python：

批量索引Elasticsearch中的所有数据

from elasticsearch import Elasticsearch, helpers
import json
import os
import sys
import sys, json

es = Elasticsearch()   

def load_json(directory):
    for filename in os.listdir(directory):
        if filename.endswith('.json'):
            with open(filename,'r') as open_file:
                yield json.load(open_file)

helpers.bulk(es, load_json(sys.argv[1]), index='v1_resume', doc_type='candidate')

我知道如果没有提到ID，ES会自行提供20个字符长的ID，但是我希望它从ID = 1开始索引，直到文档数量为止。

我怎样才能做到这一点？

Answer 1

在弹性搜索中，如果您不为文档选择ID，则会自动为您创建ID，请在此处查看 elastic docs：

Autogenerated IDs are 20 character long, URL-safe, Base64-encoded GUID 
strings. These GUIDs are generated from a modified FlakeID scheme which 
allows multiple nodes to be generating unique IDs in parallel with 
essentially zero chance of collision.

如果您想要自定义ID，则需要使用类似语法自行构建它们：

[
    {'_id': 1,
     '_index': 'index-name',
     '_type': 'document',
     '_source': {
          "title": "Hello World!",
          "body": "..."}

    },
    {'_id': 2,
     '_index': 'index-name',
     '_type': 'document',
     '_source': {
          "title": "Hello World!",
          "body": "..."}
    }
]

helpers.bulk(es, load_json(sys.argv[1])

由于您正在对type内的index和schema进行评分，因此您无需在helpers.bulk()方法中执行此操作。您需要更改'load_json'的输出以创建带有dicts的列表（如上所述）以保存在es（python elastic client docs）

中

带有顺序ID的Elasticsearch中的批量索引数据

1 个答案: