如何为Python Elasticsearch mSearch创建请求体

时间:2015-02-16 16:43:10

标签: python elasticsearch

我试图在Elasticsearch Python客户端上运行a multi search request。我可以正确运行单一搜索,但无法弄清楚如何格式化msearch的请求。根据文档,请求的正文需要格式化为:

  

请求定义(元数据搜索请求定义对),如   要么是换行符分隔的字符串,要么是要序列化的序列序列   (每行一个)。

创建此请求正文的最佳方法是什么?我一直在寻找例子,但似乎找不到任何例子。

4 个答案:

答案 0 :(得分:15)

如果你按照official doc的演示(甚至认为它是BulkAPI),你会发现如何使用Elasticsearch客户端在python中构建你的请求:

以下是换行符分隔字符串

def msearch():
    es = get_es_instance()

    search_arr = []
    # req_head
    search_arr.append({'index': 'my_test_index', 'type': 'doc_type_1'})
    # req_body
    search_arr.append({"query": {"term" : {"text" : "bag"}}, 'from': 0, 'size': 2})

    # req_head
    search_arr.append({'index': 'my_test_index', 'type': 'doc_type_2'})
    # req_body
    search_arr.append({"query": {"match_all" : {}}, 'from': 0, 'size': 2})

    request = ''
    for each in search_arr:
        request += '%s \n' %json.dumps(each)

    # as you can see, you just need to feed the <body> parameter,
    # and don't need to specify the <index> and <doc_type> as usual 
    resp = es.msearch(body = request)

如您所见,最终请求由几个req_unit构成。 每个req_unit构造如下所示:

request_header(search control about index_name, optional mapping-types, search-types etc.)\n
reqeust_body(which involves query detail about this request)\n

序列化方式的序列与前一个序列几乎相同,只是您不需要将其转换为字符串:

def msearch():
    es = get_es_instance()

    request = []

    req_head = {'index': 'my_test_index', 'type': 'doc_type_1'}
    req_body = {
        'query': {'term': {'text' : 'bag'}}, 
        'from' : 0, 'size': 2  }
    request.extend([req_head, req_body])

    req_head = {'index': 'my_test_index', 'type': 'doc_type_2'}
    req_body = {
        'query': {'range': {'price': {'gte': 100, 'lt': 300}}},
        'from' : 0, 'size': 2  }
    request.extend([req_head, req_body])

    resp = es.msearch(body = request)

Here is它返回的结构。详细了解msearch

答案 1 :(得分:4)

如果您使用的是elasticsearch-dsl,则可以使用课程MultiSearch

文档中的示例:

from elasticsearch_dsl import MultiSearch, Search

ms = MultiSearch(index='blogs')

ms = ms.add(Search().filter('term', tags='python'))
ms = ms.add(Search().filter('term', tags='elasticsearch'))

responses = ms.execute()

for response in responses:
    print("Results for query %r." % response.search.query)
    for hit in response:
        print(hit.title)

答案 2 :(得分:1)

这是我想出的。我使用相同的文档类型和索引,所以我优化了代码以使用相同的标题运行多个查询:

from elasticsearch import Elasticsearch
from elasticsearch import exceptions as es_exceptions
import json

RETRY_ATTEMPTS = 10
RECONNECT_SLEEP_SECS = 0.5

def msearch(es_conn, queries, index, doc_type, retries=0):
    """
    Es multi-search query
    :param queries: list of dict, es queries
    :param index: str, index to query against
    :param doc_type: str, defined doc type i.e. event
    :param retries: int, current retry attempt
    :return: list, found docs
    """
    search_header = json.dumps({'index': index, 'type': doc_type})
    request = ''
    for q in queries:
        # request head, body pairs
        request += '{}\n{}\n'.format(search_header, json.dumps(q))
    try:
        resp = es_conn.msearch(body=request, index=index)
        found = [r['hits']['hits'] for r in resp['responses']]
    except (es_exceptions.ConnectionTimeout, es_exceptions.ConnectionError,
            es_exceptions.TransportError):  # pragma: no cover
        logging.warning("msearch connection failed, retrying...")  # Retry on timeout
        if retries > RETRY_ATTEMPTS:  # pragma: no cover
            raise
        time.sleep(RECONNECT_SLEEP_SECS)
        found = msearch(queries=queries, index=index, retries=retries + 1)
    except Exception as e:  # pragma: no cover
        logging.critical("msearch error {} on query {}".format(e, queries))
        raise
    return found

es_conn = Elasticsearch()
queries = []
queries.append(
    {"min_score": 2.0, "query": {"bool": {"should": [{"match": {"name.tokenized": {"query": "batman"}}}]}}}
)
queries.append(
    {"min_score": 1.0, "query": {"bool": {"should": [{"match": {"name.tokenized": {"query": "ironman"}}}]}}}
)
queries.append(
    {"track_scores": True, "min_score": 9.0, "query":
        {"bool": {"should": [{"match": {"name": {"query": "not-findable"}}}]}}}
)
q_results = msearch(es_conn, queries, index='pipeliner_current', doc_type='event')

如果您想对同一索引和文档类型进行多次查询,这可能是您正在寻找的。

答案 3 :(得分:0)

知道了!这是我为其他人做的......

query_list = ""
es = ElasticSearch("myurl")
for obj in my_list:
    query = constructQuery(name)
    query_count += 1
    query_list += json.dumps({})
    query_list += json.dumps(query)
    if query_count <= 19:
        query_list += "\n"
    if query_count == 20:
        es.msearch(index = "m_index", body = query_list)

我不得不两次添加索引而搞砸了。即使使用Python客户端,您仍然必须包含原始文档中描述的索引部分。现在可以使用了!