Question

我想定期更新elasticsearch中的数据。

在我发送的更新文件中，弹性密码（用于更新）和新文档（用于插入）的数据可能已存在。

由于elasticsearch中的数据由自动创建的ID管理，我必须通过列“代码”（唯一）搜索ID，以确保文档是否已存在，如果存在更新，否则插入。

我想知道是否有任何方法比我想到的代码更快。

es = Elasticsearch()

# get doc ID by searching(exact match) a code to check if ID exists
res = es.search(index=index_name, doc_type=doc_type, body=body_for_search)
id_dict = dict([('id', doc['_id'])]) for doc in res['hits']['hits’]

# if id exists, update the current doc by id   
# else insert with auto-created id 
If id_dict['id']:
    es.update(index=index_name, id=id_dict['id'], doc_type=doc_type, body=body)
else:
    es.index(index=index_name, doc_type=doc_type, body=body)

例如，是否有一种方法，其中elasticsearch为您搜索完全匹配col["code"]，您可以简单地“插入”数据而不指定ID？任何建议都将非常感谢，并感谢您的阅读。

ps-如果我们制作id = col["code"]它可以更简单，更快，但对于管理问题，我们现在无法做到这一点。

Answer 1

正如@Archit所说，使用您自己的ID更快地查找文档
使用upsert API https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#upserts

确保您的ID结构尊重Lucene良好做法：

如果您使用自己的ID，请尝试选择一个友好的ID Lucene的。示例包括零填充顺序ID，UUID-1和 nanotime;这些ID具有压缩的一致，顺序模式好。相比之下，诸如UUID-4之类的ID基本上是随机的并且提供压力不足，减慢Lucene。

用于elasicsearch的upsert功能？

1 个答案: