我试图在Elasticsearch Python客户端上运行a multi search request。我可以正确运行单一搜索,但无法弄清楚如何格式化msearch的请求。根据文档,请求的正文需要格式化为:
请求定义(元数据搜索请求定义对),如 要么是换行符分隔的字符串,要么是要序列化的序列序列 (每行一个)。
创建此请求正文的最佳方法是什么?我一直在寻找例子,但似乎找不到任何例子。
答案 0 :(得分:15)
如果你按照official doc的演示(甚至认为它是BulkAPI),你会发现如何使用Elasticsearch
客户端在python中构建你的请求:
以下是换行符分隔字符串:
def msearch():
es = get_es_instance()
search_arr = []
# req_head
search_arr.append({'index': 'my_test_index', 'type': 'doc_type_1'})
# req_body
search_arr.append({"query": {"term" : {"text" : "bag"}}, 'from': 0, 'size': 2})
# req_head
search_arr.append({'index': 'my_test_index', 'type': 'doc_type_2'})
# req_body
search_arr.append({"query": {"match_all" : {}}, 'from': 0, 'size': 2})
request = ''
for each in search_arr:
request += '%s \n' %json.dumps(each)
# as you can see, you just need to feed the <body> parameter,
# and don't need to specify the <index> and <doc_type> as usual
resp = es.msearch(body = request)
如您所见,最终请求由几个req_unit构成。 每个req_unit构造如下所示:
request_header(search control about index_name, optional mapping-types, search-types etc.)\n
reqeust_body(which involves query detail about this request)\n
序列化方式的序列与前一个序列几乎相同,只是您不需要将其转换为字符串:
def msearch():
es = get_es_instance()
request = []
req_head = {'index': 'my_test_index', 'type': 'doc_type_1'}
req_body = {
'query': {'term': {'text' : 'bag'}},
'from' : 0, 'size': 2 }
request.extend([req_head, req_body])
req_head = {'index': 'my_test_index', 'type': 'doc_type_2'}
req_body = {
'query': {'range': {'price': {'gte': 100, 'lt': 300}}},
'from' : 0, 'size': 2 }
request.extend([req_head, req_body])
resp = es.msearch(body = request)
答案 1 :(得分:4)
如果您使用的是elasticsearch-dsl,则可以使用课程MultiSearch。
文档中的示例:
from elasticsearch_dsl import MultiSearch, Search
ms = MultiSearch(index='blogs')
ms = ms.add(Search().filter('term', tags='python'))
ms = ms.add(Search().filter('term', tags='elasticsearch'))
responses = ms.execute()
for response in responses:
print("Results for query %r." % response.search.query)
for hit in response:
print(hit.title)
答案 2 :(得分:1)
这是我想出的。我使用相同的文档类型和索引,所以我优化了代码以使用相同的标题运行多个查询:
from elasticsearch import Elasticsearch
from elasticsearch import exceptions as es_exceptions
import json
RETRY_ATTEMPTS = 10
RECONNECT_SLEEP_SECS = 0.5
def msearch(es_conn, queries, index, doc_type, retries=0):
"""
Es multi-search query
:param queries: list of dict, es queries
:param index: str, index to query against
:param doc_type: str, defined doc type i.e. event
:param retries: int, current retry attempt
:return: list, found docs
"""
search_header = json.dumps({'index': index, 'type': doc_type})
request = ''
for q in queries:
# request head, body pairs
request += '{}\n{}\n'.format(search_header, json.dumps(q))
try:
resp = es_conn.msearch(body=request, index=index)
found = [r['hits']['hits'] for r in resp['responses']]
except (es_exceptions.ConnectionTimeout, es_exceptions.ConnectionError,
es_exceptions.TransportError): # pragma: no cover
logging.warning("msearch connection failed, retrying...") # Retry on timeout
if retries > RETRY_ATTEMPTS: # pragma: no cover
raise
time.sleep(RECONNECT_SLEEP_SECS)
found = msearch(queries=queries, index=index, retries=retries + 1)
except Exception as e: # pragma: no cover
logging.critical("msearch error {} on query {}".format(e, queries))
raise
return found
es_conn = Elasticsearch()
queries = []
queries.append(
{"min_score": 2.0, "query": {"bool": {"should": [{"match": {"name.tokenized": {"query": "batman"}}}]}}}
)
queries.append(
{"min_score": 1.0, "query": {"bool": {"should": [{"match": {"name.tokenized": {"query": "ironman"}}}]}}}
)
queries.append(
{"track_scores": True, "min_score": 9.0, "query":
{"bool": {"should": [{"match": {"name": {"query": "not-findable"}}}]}}}
)
q_results = msearch(es_conn, queries, index='pipeliner_current', doc_type='event')
如果您想对同一索引和文档类型进行多次查询,这可能是您正在寻找的。 p>
答案 3 :(得分:0)
知道了!这是我为其他人做的......
query_list = ""
es = ElasticSearch("myurl")
for obj in my_list:
query = constructQuery(name)
query_count += 1
query_list += json.dumps({})
query_list += json.dumps(query)
if query_count <= 19:
query_list += "\n"
if query_count == 20:
es.msearch(index = "m_index", body = query_list)
我不得不两次添加索引而搞砸了。即使使用Python客户端,您仍然必须包含原始文档中描述的索引部分。现在可以使用了!