使用python 3.6.0在elasticsearch中存储图像的正确方法

时间:2017-06-17 09:37:20

标签: image python-3.x elasticsearch

约束

  • 语言Python 3.6.0
  • 图片尺寸最大为5MB [格式.PNG .JPG和.JEPG]
  • 我必须将图像存储在elasticsearch中。这是一个要求。但是,只要图像可以重建,我使用的格式就不重要了。

我有图像的物理位置。我打开图像并将其转换为base64格式。然后我试图在我的localhost上运行的elasticsearch中对其进行索引。但它没有用。我想我需要在这里使用批量api。但我发现批量API需要actionsgenerators。在我的情况下,我如何使用批量在弹性搜索中保存我的图像?或者还有其他有效的方法来索引elasticsearch中的图像吗?

请注意,我可以成功加载图像并将其编码为bytes。其他IndexSearch(GET)查询在我的localhost:9200上工作正常。

到目前为止,这是我的方法。

from elasticsearch import Elasticsearch
import uuid
import base64

client = Elasticsearch([{'host': 'localhost', 'port':9200}])
def persist_image_in_elastic(imagePath):
     curMethodst = time.time()
     # imagePath = 'images/heroalom/image_22.png'
     with open(imagePath, "rb") as imageFile:
          rawImage = base64.b64encode(imageFile.read())

     elasticIndex = 'raw-image-index'
     doc_type = 'raw-image'
     rawImageModel = {'id': 'f00b5f7c17534d22ab5cfb950bea972c', 'raw': rawImage }
     elasticResp = client.index(index = elasticIndex, doc_type = doc_type,id = idForReceivedImage, body = rawImageModel)

弹性研究的映射

{
   "raw-image-index": {
      "mappings": {
         "raw-image": {
            "properties": {
               "id": {
                  "type": "text"
               },
               "raw": {
                  "type": "text"
               }
            }
         }
      }
   }
}

1 个答案:

答案 0 :(得分:2)

你快到了。您唯一需要做的就是将rawImage包裹在str()电话中,如下所示:

rawImageModel = {'id': 'f00b5f7c17534d22ab5cfb950bea972c', 'raw': str(rawImage) }

现在有点解释。 base64.b64encode会返回bytes类型的对象,而ElasticSearch客户端则需要string

实际上,您提供的python代码会引发可用于调试的异常:

Traceback (most recent call last):
  File "code.py", line 19, in <module>
    persist_image_in_elastic('/Users/vasiliev/Downloads/es_logo_small.png')
  File "code.py", line 17, in persist_image_in_elastic
    elasticResp = client.index(index = elasticIndex, doc_type = doc_type,id = 'f00b5f7c17534d22ab5cfb950bea972c', body = rawImageModel)
  File "/Users/vasiliev/.virtualenvs/es-blob-3.6/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/Users/vasiliev/.virtualenvs/es-blob-3.6/lib/python3.6/site-packages/elasticsearch/client/__init__.py", line 298, in index
    _make_path(index, doc_type, id), params=params, body=body)
  File "/Users/vasiliev/.virtualenvs/es-blob-3.6/lib/python3.6/site-packages/elasticsearch/transport.py", line 278, in perform_request
    body = self.serializer.dumps(body)
  File "/Users/vasiliev/.virtualenvs/es-blob-3.6/lib/python3.6/site-packages/elasticsearch/serializer.py", line 50, in dumps
    raise SerializationError(data, e)
elasticsearch.exceptions.SerializationError: ({'id': 'f00b5f7c17534d22ab5cfb950bea972c', 'raw': b'iVB...mCC'}, TypeError("Unable to serialize b'iVB...mCC' (type: <class 'bytes'>)",))

作为最后评论,请考虑使用Binary data type来存储二进制数据。使用您提供的映射,ElasticSearch会将所有二进制对象存储在全文搜索索引中,您将无法查询。另一种选择是将此字段设置为非索引:

{
   "raw-image-index": {
      "mappings": {
         "raw-image": {
            "properties": {
               "id": {
                  "type": "text"
               },
               "raw": {
                  "type": "text",
                  "index": "no"
               }
            }
         }
      }
   }
}