Question

我试图在Elasticsearch Python客户端上运行a multi search request。我可以正确运行单一搜索，但无法弄清楚如何格式化msearch的请求。根据文档，请求的正文需要格式化为：

请求定义（元数据搜索请求定义对），如要么是换行符分隔的字符串，要么是要序列化的序列序列（每行一个）。

创建此请求正文的最佳方法是什么？我一直在寻找例子，但似乎找不到任何例子。

Answer 1

如果你按照official doc的演示（甚至认为它是BulkAPI），你会发现如何使用Elasticsearch客户端在python中构建你的请求：

以下是换行符分隔字符串：

def msearch():
    es = get_es_instance()

    search_arr = []
    # req_head
    search_arr.append({'index': 'my_test_index', 'type': 'doc_type_1'})
    # req_body
    search_arr.append({"query": {"term" : {"text" : "bag"}}, 'from': 0, 'size': 2})

    # req_head
    search_arr.append({'index': 'my_test_index', 'type': 'doc_type_2'})
    # req_body
    search_arr.append({"query": {"match_all" : {}}, 'from': 0, 'size': 2})

    request = ''
    for each in search_arr:
        request += '%s \n' %json.dumps(each)

    # as you can see, you just need to feed the <body> parameter,
    # and don't need to specify the <index> and <doc_type> as usual 
    resp = es.msearch(body = request)

如您所见，最终请求由几个req_unit构成。每个req_unit构造如下所示：

request_header(search control about index_name, optional mapping-types, search-types etc.)\n
reqeust_body(which involves query detail about this request)\n

序列化方式的序列与前一个序列几乎相同，只是您不需要将其转换为字符串：

def msearch(): es = get_es_instance() request = [] req_head = {'index': 'my_test_index', 'type': 'doc_type_1'} req_body = { 'query': {'term': {'text' : 'bag'}}, 'from' : 0, 'size': 2 } request.extend([req_head, req_body]) req_head = {'index': 'my_test_index', 'type': 'doc_type_2'} req_body = { 'query': {'range': {'price': {'gte': 100, 'lt': 300}}}, 'from' : 0, 'size': 2 } request.extend([req_head, req_body]) resp = es.msearch(body = request)

Here is它返回的结构。详细了解msearch。

Answer 2

如果您使用的是elasticsearch-dsl，则可以使用课程MultiSearch。

文档中的示例：

from elasticsearch_dsl import MultiSearch, Search

ms = MultiSearch(index='blogs')

ms = ms.add(Search().filter('term', tags='python'))
ms = ms.add(Search().filter('term', tags='elasticsearch'))

responses = ms.execute()

for response in responses:
    print("Results for query %r." % response.search.query)
    for hit in response:
        print(hit.title)

Answer 3

这是我想出的。我使用相同的文档类型和索引，所以我优化了代码以使用相同的标题运行多个查询：

from elasticsearch import Elasticsearch
from elasticsearch import exceptions as es_exceptions
import json

RETRY_ATTEMPTS = 10
RECONNECT_SLEEP_SECS = 0.5

def msearch(es_conn, queries, index, doc_type, retries=0):
    """
    Es multi-search query
    :param queries: list of dict, es queries
    :param index: str, index to query against
    :param doc_type: str, defined doc type i.e. event
    :param retries: int, current retry attempt
    :return: list, found docs
    """
    search_header = json.dumps({'index': index, 'type': doc_type})
    request = ''
    for q in queries:
        # request head, body pairs
        request += '{}\n{}\n'.format(search_header, json.dumps(q))
    try:
        resp = es_conn.msearch(body=request, index=index)
        found = [r['hits']['hits'] for r in resp['responses']]
    except (es_exceptions.ConnectionTimeout, es_exceptions.ConnectionError,
            es_exceptions.TransportError):  # pragma: no cover
        logging.warning("msearch connection failed, retrying...")  # Retry on timeout
        if retries > RETRY_ATTEMPTS:  # pragma: no cover
            raise
        time.sleep(RECONNECT_SLEEP_SECS)
        found = msearch(queries=queries, index=index, retries=retries + 1)
    except Exception as e:  # pragma: no cover
        logging.critical("msearch error {} on query {}".format(e, queries))
        raise
    return found

es_conn = Elasticsearch()
queries = []
queries.append(
    {"min_score": 2.0, "query": {"bool": {"should": [{"match": {"name.tokenized": {"query": "batman"}}}]}}}
)
queries.append(
    {"min_score": 1.0, "query": {"bool": {"should": [{"match": {"name.tokenized": {"query": "ironman"}}}]}}}
)
queries.append(
    {"track_scores": True, "min_score": 9.0, "query":
        {"bool": {"should": [{"match": {"name": {"query": "not-findable"}}}]}}}
)
q_results = msearch(es_conn, queries, index='pipeliner_current', doc_type='event')

如果您想对同一索引和文档类型进行多次查询，这可能是您正在寻找的。

Answer 4

知道了！这是我为其他人做的......

query_list = ""
es = ElasticSearch("myurl")
for obj in my_list:
    query = constructQuery(name)
    query_count += 1
    query_list += json.dumps({})
    query_list += json.dumps(query)
    if query_count <= 19:
        query_list += "\n"
    if query_count == 20:
        es.msearch(index = "m_index", body = query_list)

我不得不两次添加索引而搞砸了。即使使用Python客户端，您仍然必须包含原始文档中描述的索引部分。现在可以使用了！

如何为Python Elasticsearch mSearch创建请求体

4 个答案: