带有顺序ID的Elasticsearch中的批量索引数据

时间:2017-05-16 11:11:45

标签: python json elasticsearch

我正在使用此代码使用python:

批量索引Elasticsearch中的所有数据
from elasticsearch import Elasticsearch, helpers
import json
import os
import sys
import sys, json

es = Elasticsearch()   

def load_json(directory):
    for filename in os.listdir(directory):
        if filename.endswith('.json'):
            with open(filename,'r') as open_file:
                yield json.load(open_file)

helpers.bulk(es, load_json(sys.argv[1]), index='v1_resume', doc_type='candidate')

我知道如果没有提到ID,ES会自行提供20个字符长的ID,但是我希望它从ID = 1开始索引,直到文档数量为止。

我怎样才能做到这一点?

1 个答案:

答案 0 :(得分:0)

在弹性搜索中,如果您不为文档选择ID,则会自动为您创建ID,请在此处查看 elastic docs

Autogenerated IDs are 20 character long, URL-safe, Base64-encoded GUID 
strings. These GUIDs are generated from a modified FlakeID scheme which 
allows multiple nodes to be generating unique IDs in parallel with 
essentially zero chance of collision.

如果您想要自定义ID,则需要使用类似语法自行构建它们:

[
    {'_id': 1,
     '_index': 'index-name',
     '_type': 'document',
     '_source': {
          "title": "Hello World!",
          "body": "..."}

    },
    {'_id': 2,
     '_index': 'index-name',
     '_type': 'document',
     '_source': {
          "title": "Hello World!",
          "body": "..."}
    }
]

helpers.bulk(es, load_json(sys.argv[1])

由于您正在对type内的indexschema进行评分,因此您无需在helpers.bulk()方法中执行此操作。您需要更改'load_json'的输出以创建带有dicts的列表(如上所述)以保存在es(python elastic client docs