Elasticsearch stream_bulk生成器不可迭代

时间:2018-07-16 09:42:36

标签: python elasticsearch

我正在尝试使用elasticsearch.helpers.streaming_bulk函数为某些文档建立索引。当我尝试使用here中的示例获取结果时,出现错误:TypeError: 'function' object is not iterable

这是我的功能:

import elasticsearch.helpers
from elasticsearch import Elasticsearch

def index_with_streaming_bulk(self):

    all_body = []

    with open(self.geonames_file, encoding='utf-8') as csvfile:

        reader = csv.reader(csvfile, delimiter='\t')
        body = []
        next(reader)  # skip column names
        for row_ind, row in enumerate(reader):
            body.append({
                "index": {
                    "_id": row_ind+1  # to map index value to geonames. remove the column headers
                }
            })
            doc = {}

            for field_tup in self.included_cols:
                field_name = field_tup[0]
                field_ind = field_tup[1]
                field_type = field_tup[2]
                val_init = row[field_ind]

                mod_val = self.transform_value(field_type, val_init)
                doc[field_name] = mod_val

            body.append(doc)
            all_body.append(body)

    def gendata():
        for body in all_body:
            yield body

    res = elasticsearch.helpers.streaming_bulk(client=es, actions=gendata, chunk_size=500,
                                                             max_retries=5, initial_backoff=2, max_backoff=600,
                                               request_timeout=20)

    for ok, response in res:
        print(ok, response)

编辑: 这是完整的堆栈跟踪:

"C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\python.exe" C:/Users/admin/PycharmProjects/ElasticSearch/ES_Indexer_Geonames.py
Traceback (most recent call last):
  File "C:/Users/admin/PycharmProjects/ElasticSearch/ES_Indexer_Geonames.py", line 267, in <module>
    Indexer(init_hydro_concat, index_name, doc_name).index_with_streaming_bulk()
  File "C:/Users/admin/PycharmProjects/ElasticSearch/ES_Indexer_Geonames.py", line 207, in index_with_streaming_bulk
    for ok, response in res:
  File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\elasticsearch\helpers\__init__.py", line 176, in streaming_bulk
    actions = map(expand_action_callback, actions)
TypeError: 'function' object is not iterable

感谢您的帮助!

2 个答案:

答案 0 :(得分:1)

这是由于身体命令的构造。我需要创建主体作为dict,并将所有主体dict纳入列表。 这是解决方案:

def index_with_streaming_bulk(self):

    all_body = []

    with open(self.geonames_file, encoding='utf-8') as csvfile:

        reader = csv.reader(csvfile, delimiter='\t')
        body = {}
        next(reader)  # skip column names

        for row_ind, row in enumerate(reader):

            body['_index'] = self.index_name
            body['_type'] = self.doc_type
            body['_id'] = row_ind + 1  # to map index value to geonames. remove the column headers

            for field_tup in self.included_cols:
                field_name = field_tup[0]
                field_ind = field_tup[1]
                field_type = field_tup[2]
                val_init = row[field_ind]

                mod_val = self.transform_value(field_type, val_init)
                body[field_name] = mod_val

            all_body.append(body)
            body={}

    def gendata():
        for body in all_body:
            yield body

    res = elasticsearch.helpers.streaming_bulk(client=es, actions=all_body, chunk_size=1000, max_retries=5,
                                               initial_backoff=2, max_backoff=600, request_timeout=3600)
    for ok, response in res:
        print(ok, response)

答案 1 :(得分:0)

根据elasticsearch.helpers.streamin_bulk文档actions的参数,它是一个包含要执行的动作的可迭代对象,但不是生成该可迭代对象的函数。

我发现该功能有数examples的用法,并且在所有情况下actions参数的值都是函数的结果,而不是函数本身。因此,我相信您的情况应该是:

   res = elasticsearch.helpers.streaming_bulk(client=es, actions=gendata(), chunk_size=500, max_retries=5, initial_backoff=2, max_backoff=600, request_timeout=20)

gendata之后注意(),这意味着实际上已调用此函数,并且生成器生成的结果作为参数而不是函数本身传递。