我正在尝试使用elasticsearch.helpers.streaming_bulk
函数为某些文档建立索引。当我尝试使用here中的示例获取结果时,出现错误:TypeError: 'function' object is not iterable
。
这是我的功能:
import elasticsearch.helpers
from elasticsearch import Elasticsearch
def index_with_streaming_bulk(self):
all_body = []
with open(self.geonames_file, encoding='utf-8') as csvfile:
reader = csv.reader(csvfile, delimiter='\t')
body = []
next(reader) # skip column names
for row_ind, row in enumerate(reader):
body.append({
"index": {
"_id": row_ind+1 # to map index value to geonames. remove the column headers
}
})
doc = {}
for field_tup in self.included_cols:
field_name = field_tup[0]
field_ind = field_tup[1]
field_type = field_tup[2]
val_init = row[field_ind]
mod_val = self.transform_value(field_type, val_init)
doc[field_name] = mod_val
body.append(doc)
all_body.append(body)
def gendata():
for body in all_body:
yield body
res = elasticsearch.helpers.streaming_bulk(client=es, actions=gendata, chunk_size=500,
max_retries=5, initial_backoff=2, max_backoff=600,
request_timeout=20)
for ok, response in res:
print(ok, response)
编辑: 这是完整的堆栈跟踪:
"C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\python.exe" C:/Users/admin/PycharmProjects/ElasticSearch/ES_Indexer_Geonames.py
Traceback (most recent call last):
File "C:/Users/admin/PycharmProjects/ElasticSearch/ES_Indexer_Geonames.py", line 267, in <module>
Indexer(init_hydro_concat, index_name, doc_name).index_with_streaming_bulk()
File "C:/Users/admin/PycharmProjects/ElasticSearch/ES_Indexer_Geonames.py", line 207, in index_with_streaming_bulk
for ok, response in res:
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\elasticsearch\helpers\__init__.py", line 176, in streaming_bulk
actions = map(expand_action_callback, actions)
TypeError: 'function' object is not iterable
感谢您的帮助!
答案 0 :(得分:1)
这是由于身体命令的构造。我需要创建主体作为dict,并将所有主体dict纳入列表。 这是解决方案:
def index_with_streaming_bulk(self):
all_body = []
with open(self.geonames_file, encoding='utf-8') as csvfile:
reader = csv.reader(csvfile, delimiter='\t')
body = {}
next(reader) # skip column names
for row_ind, row in enumerate(reader):
body['_index'] = self.index_name
body['_type'] = self.doc_type
body['_id'] = row_ind + 1 # to map index value to geonames. remove the column headers
for field_tup in self.included_cols:
field_name = field_tup[0]
field_ind = field_tup[1]
field_type = field_tup[2]
val_init = row[field_ind]
mod_val = self.transform_value(field_type, val_init)
body[field_name] = mod_val
all_body.append(body)
body={}
def gendata():
for body in all_body:
yield body
res = elasticsearch.helpers.streaming_bulk(client=es, actions=all_body, chunk_size=1000, max_retries=5,
initial_backoff=2, max_backoff=600, request_timeout=3600)
for ok, response in res:
print(ok, response)
答案 1 :(得分:0)
根据elasticsearch.helpers.streamin_bulk
文档actions
的参数,它是一个包含要执行的动作的可迭代对象,但不是生成该可迭代对象的函数。
我发现该功能有数examples的用法,并且在所有情况下actions
参数的值都是函数的结果,而不是函数本身。因此,我相信您的情况应该是:
res = elasticsearch.helpers.streaming_bulk(client=es, actions=gendata(), chunk_size=500, max_retries=5, initial_backoff=2, max_backoff=600, request_timeout=20)
在gendata
之后注意(),这意味着实际上已调用此函数,并且生成器生成的结果作为参数而不是函数本身传递。