我正在使用此代码使用python:
批量索引Elasticsearch中的所有数据from elasticsearch import Elasticsearch, helpers
import json
import os
import sys
import sys, json
es = Elasticsearch()
def load_json(directory):
for filename in os.listdir(directory):
if filename.endswith('.json'):
with open(filename,'r') as open_file:
yield json.load(open_file)
helpers.bulk(es, load_json(sys.argv[1]), index='v1_resume', doc_type='candidate')
我知道如果没有提到ID,ES会自行提供20个字符长的ID,但是我希望它从ID = 1开始索引,直到文档数量为止。
我怎样才能做到这一点?
答案 0 :(得分:0)
在弹性搜索中,如果您不为文档选择ID
,则会自动为您创建ID
,请在此处查看
elastic docs:
Autogenerated IDs are 20 character long, URL-safe, Base64-encoded GUID
strings. These GUIDs are generated from a modified FlakeID scheme which
allows multiple nodes to be generating unique IDs in parallel with
essentially zero chance of collision.
如果您想要自定义ID,则需要使用类似语法自行构建它们:
[
{'_id': 1,
'_index': 'index-name',
'_type': 'document',
'_source': {
"title": "Hello World!",
"body": "..."}
},
{'_id': 2,
'_index': 'index-name',
'_type': 'document',
'_source': {
"title": "Hello World!",
"body": "..."}
}
]
helpers.bulk(es, load_json(sys.argv[1])
由于您正在对type
内的index
和schema
进行评分,因此您无需在helpers.bulk()
方法中执行此操作。您需要更改'load_json'的输出以创建带有dicts的列表(如上所述)以保存在es(python elastic client docs)