在elasticsearch中进行批量上传时,获取`TypeError:无法散列的类型:'dict'

时间:2019-08-30 07:19:11

标签: python python-3.x elasticsearch

我正在使用批量方法在Elasticsearch中建立数据索引,以最大程度地减少在Elasticsearch中建立数据索引的时间。问题是使用bulk方法后,我的旧查询失败(意味着返回0次命中),即使简单的查询匹配查询也返回零匹配

elasticsearch版本6.3,语言python, 库-Python Elasticsearch Client

最初,我已使用此代码在Elasticsearch中建立了索引。


temp_entities_list = []
for each_row in master_entities:
    entity_data = {}
    entity_data['entity_id'] = each_row.id
    entity_data['createdat'] = each_row.createdat
    entity_data['updatedat'] = each_row.updatedat
    entity_data['individual_business_tag']=each_row.individual_business_tag
    temp_entities_list.append(entity_data)

def indexing(entity_list):
    for entity in entity_list:
        index_name = "demo"
        yield{
            "_index":index_name,
            "_type":"businesses",
            "_source" :{
                "body":entity
            }
        }
try:
    helpers.bulk(es,testing(temp_entities_list))
except Exception as exe:
    indexing_logger.exception("Error:"+str(exe))

这是我的旧查询,当我一次索引单个对象时,它可以正常工作。

{
    "query": {
        "match" : {
            "entity_name" : {
                "query" : "Premium Market",
                "operator" : "and"
            }
        }
    }
}

根据文档https://elasticsearch-py.readthedocs.io/en/master/helpers.html#example,我尝试了这段代码

def indexing(entity_list):
    for entity in entity_list:
        index_name = "demo"
        yield{
            "_index":index_name,
            "_type":"businesses",
            "doc" :{entity
            }
        }

出现此错误:

Traceback (most recent call last):
  File "sql-to-elasticsearch.py", line 90, in <module>
    helpers.bulk(es,indexing(temp_entities_list),chunk_size=500,)
  File "C:\Users\AppData\Local\Programs\Python\Python37-32\lib\site-packages\elasticsearch\helpers\__init__.py", line 257, in bulk
    for ok, item in streaming_bulk(client, actions, *args, **kwargs):
  File "C:\Users\AppData\Local\Programs\Python\Python37-32\lib\site-packages\elasticsearch\helpers\__init__.py", line 180, in streaming_bulk
    client.transport.serializer):
  File "C:\Users\AppData\Local\Programs\Python\Python37-32\lib\site-packages\elasticsearch\helpers\__init__.py", line 58, in _chunk_actions
    for action, data in actions:
  File "sql-to-elasticsearch.py", line 81, in indexing
    index_name = "demo"
TypeError: unhashable type: 'dict'

1 个答案:

答案 0 :(得分:2)

我相信这会导致错误:

"doc" :{entity}

由于您的entity似乎是一本字典,并且您试图将其放入集合中,因此在Python中,只能将不可变的对象存储在集合中(字符串,整数,浮点数,元组...),如下所示:它们是可哈希的。

请注意,此符号用于集{}

如果您想将其放入容器中,建议使用列表:

"doc" : [entity]

或者如果您仅在使用文档时指向entity

 "doc" : entity

希望这会有所帮助。