如何使用python更新弹性搜索文档?

时间:2020-07-08 15:57:34

标签: python elasticsearch

我下面有代码将数据添加到弹性搜索中

from elasticsearch import Elasticsearch
es = Elasticsearch()
es.cluster.health()
r = [{'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'},
 {'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'},
 {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]
es.indices.create(index='my-index_1', ignore=400)

for e in enumerate(r):
    #es.indices.update(index="my-index_1", body=e[1])
    es.index(index="my-index_1", body=e[1])

#Retrieve the data
es.search(index = 'my-index_1')['hits']['hits']

要求 如何更新文件

r = [{'Name': 'Dr. Messi', 'Specialised and Location': 'Health'},
     {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'},
     {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]

Dr. Messi, Dr. Christiano必须更新索引,Dr. Bernard M. Aaron应该不更新,因为它已经存在于索引中

1 个答案:

答案 0 :(得分:4)

在Elasticsearch中,当您索引数据而未给出自定义ID时,elasticsearch将为您索引的每个文档创建一个新的ID。

因此,在您的情况下,因为您没有提供任何ID,elasticsearch会为您提供它。 但是,您还想检查Name是否已经存在,这取决于您要对数据进行索引。有两种可能的解决方案。

  1. 为数据建立索引,而无需为每个文档传递_id。之后,您必须使用Name搜索该文档是否存在。
  2. 使用您自己的_id为每个文档建立数据索引。用_id搜索之后。这是一种更快,更容易的方法。

我将继续使用第二种方法来创建自己的ID。在Name上搜索时,我将创建一个基于Name的值字段。 Name值字段的哈希值是_id。我将使用md5。但是您可以使用任何其他哈希函数。

第一索引数据:

import hashlib
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.cluster.health()
r = [{'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]

index_name="my-index_1"
es.indices.create(index=index_name, ignore=400)


for e in enumerate(r):
    #es.indices.update(index="my-index_1", body=e[1])
    es.index(index=index_name, body=e[1],id=hashlib.md5(e[1]['Name'].encode()).hexdigest())

输出:

[{'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '1164c423bc4e2fcb75697c3031af9ef1',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christopher DeSimone',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '672ae14197a135c39eab759be8b0597f',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '85702447f9e9ea010054eaf0555ce79c',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Bernard M. Aaron',
   'Specialised and Location': 'Health'}}]

下一步:索引新数据

r = [{'Name': 'Dr. Messi', 'Specialised and Location': 'Health'},
     {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'},
     {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]


for rec in r:
    try:
        es.get(index=index_name, id=hashlib.md5(rec['Name'].encode()).hexdigest())
    except NotFoundError:
        print("Record Not found")
        es.index(index=index_name, body=rec,id=hashlib.md5(rec['Name'].encode()).hexdigest())

输出:

[{'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '1164c423bc4e2fcb75697c3031af9ef1',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christopher DeSimone',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '672ae14197a135c39eab759be8b0597f',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '85702447f9e9ea010054eaf0555ce79c',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Bernard M. Aaron',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': 'e2e0f463145568471097ff027b18b40d',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Messi', 'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '23bb4f1a3a41efe7f4cab8a80d766708',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'}}]

您可以看到Dr. Bernard M. Aaron记录没有索引,因为它已经存在

相关问题