如何嵌套结构化筛选查询?

时间:2014-12-13 16:33:06

标签: elasticsearch

我想构建大型嵌套查询(它们会很大但很简单)并且在嵌套它时会一直遇到错误。我尝试了几种变体(基于documentation),我得到的错误通常是filter malformed, no field after start_object

我想要构建的查询是一个布尔化合物:

  • 带有AND
  • 的多个字段
  • 以及上面的许多内容与OR
  • 连接在一起

我使用的示例数据:

{'N_timeend_epoch': 10, 'N_marker': True, 'N_hostip': 'A'}
{'N_timeend_epoch': 10, 'N_marker': True, 'N_hostip': 'B'}
{'N_timeend_epoch': 11, 'N_marker': True, 'N_hostip': 'A'}
{'N_timeend_epoch': 11, 'N_marker': True, 'N_hostip': 'B'}
{'N_timeend_epoch': 10, 'N_marker': False, 'N_hostip': 'A'}
{'N_timeend_epoch': 11, 'N_marker': False, 'N_hostip': 'B'}
{'N_timeend_epoch': 11, 'N_marker': False, 'N_hostip': 'B'}

它们被正确加载到elasticsearch:

curl http://localhost:9200/yop/_search?pretty
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 7,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "yop",
      "_type" : "document",
      "_id" : "AUpEErMEPK-TLWy_CSAU",
      "_score" : 1.0,
      "_source":{"N_hostip": "A", "N_timeend_epoch": 10, "N_marker": true}
    }, {
      "_index" : "yop",
      "_type" : "document",
      "_id" : "AUpEErMEPK-TLWy_CSAZ",
      "_score" : 1.0,
      "_source":{"N_hostip": "B", "N_timeend_epoch": 11, "N_marker": false}
    }, 
    (...)

我正在查看具有特定N_timeend_epochN_hostip的条目。以下代码将显示搜索查询:

import requests

list_markers = list()

for N_hostip, N_timeend_epoch in [("A", 10), ("B", 10)]:
    list_markers.append(
        {
            "query":
             {
                 "filtered":
                     {
                         "filter":
                             {
                                "bool":
                                    {
                                        "must":
                                            [
                                                {"N_hostip": N_hostip},
                                                {'N_timeend_epoch': N_timeend_epoch}
                                            ]
                                    }
                        }
                     }
             }
        }
    )

q = {
    "query": {
        "filtered": {
            "filter": { "bool": { "should": list_markers } }
        }
    }
}

url = "http://localhost:9200/yop/_search"
r = requests.get(url=url, data=json.dumps(q))
print(r.json())

我希望得到文件

    {'N_timeend_epoch': 10, 'N_marker': True, 'N_hostip': 'A'},
    {'N_timeend_epoch': 10, 'N_marker': True, 'N_hostip': 'B'},
    {'N_timeend_epoch': 10, 'N_marker': False, 'N_hostip': 'A'},

上面构建的JSON(json.dumps(q))是

{
   "query":{
      "filtered":{
         "filter":{
            "bool":{
               "should":[
                  {
                     "query":{
                        "bool":{
                           "must":[
                              {
                                 "N_hostip":"A"
                              },
                              {
                                 "N_timeend_epoch":10
                              }
                           ]
                        }
                     }
                  },
                  {
                     "query":{
                        "bool":{
                           "must":[
                              {
                                 "N_hostip":"B"
                              },
                              {
                                 "N_timeend_epoch":10
                              }
                           ]
                        }
                     }
                  }
               ]
            }
         }
      }
   }
}

我不明白如何将queryfilter/filtered结合起来。我曾尝试仅使用filter/filtered来包含所有查询,以及模式的几种组合,但它们都会导致错误

{u'status': 400, u'error': u'SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[bUIc4GtASg-1iFokFMwI8A][yop][0]: SearchParseException[[yop][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"filtered": {"filter": {"bool": {"should": [{"query": {"filtered": {"filter": {"bool": {"must": [{"N_hostip": "A"}, {"N_timeend_epoch": 10}]}}}}}, {"query": {"filtered": {"filter": {"bool": {"must": [{"N_hostip": "B"}, {"N_timeend_epoch": 10}]}}}}}]}}}}}]]]; nested: QueryParsingException[[yop] [_na] filter malformed, no field after start_object]; }{[bUIc4GtASg-1iFokFMwI8A][yop][1]: SearchParseException[[yop][1]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"filtered": {"filter": {"bool": {"should": [{"query": {"filtered": {"filter": {"bool": {"must": [{"N_hostip": "A"}, {"N_timeend_epoch": 10}]}}}}}, {"query": {"filtered": {"filter": {"bool": {"must": [{"N_hostip": "B"}, {"N_timeend_epoch": 10}]}}}}}]}}}}}]]]; nested: QueryParsingException[[yop] [_na] filter malformed, no field after start_object]; }{[bUIc4GtASg-1iFokFMwI8A][yop][2]: SearchParseException[[yop][2]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"filtered": {"filter": {"bool": {"should": [{"query": {"filtered": {"filter": {"bool": {"must": [{"N_hostip": "A"}, {"N_timeend_epoch": 10}]}}}}}, {"query": {"filtered": {"filter": {"bool": {"must": [{"N_hostip": "B"}, {"N_timeend_epoch": 10}]}}}}}]}}}}}]]]; nested: QueryParsingException[[yop] [_na] filter malformed, no field after start_object]; }{[bUIc4GtASg-1iFokFMwI8A][yop][3]: SearchParseException[[yop][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"filtered": {"filter": {"bool": {"should": [{"query": {"filtered": {"filter": {"bool": {"must": [{"N_hostip": "A"}, {"N_timeend_epoch": 10}]}}}}}, {"query": {"filtered": {"filter": {"bool": {"must": [{"N_hostip": "B"}, {"N_timeend_epoch": 10}]}}}}}]}}}}}]]]; nested: QueryParsingException[[yop] [_na] filter malformed, no field after start_object]; }{[bUIc4GtASg-1iFokFMwI8A][yop][4]: SearchParseException[[yop][4]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"filtered": {"filter": {"bool": {"should": [{"query": {"filtered": {"filter": {"bool": {"must": [{"N_hostip": "A"}, {"N_timeend_epoch": 10}]}}}}}, {"query": {"filtered": {"filter": {"bool": {"must": [{"N_hostip": "B"}, {"N_timeend_epoch": 10}]}}}}}]}}}}}]]]; nested: QueryParsingException[[yop] [_na] filter malformed, no field after start_object]; }]'}

如何正确构建此类查询?

注意:我最初添加了python标记,因为我的代码是基于Python的,但问题在于弹性搜索的语法。如果你觉得这样更好,请随意添加。

1 个答案:

答案 0 :(得分:0)

有几种方法可以解决这个问题;我将在下面分享一个。我使用了Elasticsearch 1.3.4。

首先让我说,如果您还没有看到Chrome的Sense plug-in,那么您应该查看它。自动完成有助于整理复杂的Elasticsearch语法。在Qbox,我们构建了一个修改后的版本,让我们可以共享Elasticsearch代码(您可能会说是Sense和Github Gist的组合)。以下是我在处理您的问题时汇总的一些代码:

http://sense.qbox.io/gist/095b574569026b6d80fdbb0f4a2f66c7de844b13

关于最后一个代码块的细节,但是这里是设置。我使用在"not_analyzed"字段上指定"N_hostip"的映射创建了索引,因此我们不必担心令牌被更改为小写(这是人们常见的问题)因为,如果您没有指定分析器,则会使用standard analyzer,并将标记转换为所有小写字母),然后批量索引上面列出的文档:

curl -XDELETE "http://localhost:9200/yop/"

curl -XPUT "http://localhost:9200/yop/" -d'
{
   "mappings": {
      "doc": {
         "properties": {
            "N_hostip": {
               "type": "string",
               "index": "not_analyzed"
            },
            "N_marker": {
               "type": "boolean"
            },
            "N_timeend_epoch": {
               "type": "long"
            }
         }
      }
   }
}'

curl -XPOST "http://localhost:9200/yop/_bulk/" -d'
{"index": {"_index": "yop", "_type": "doc"}}
{"N_timeend_epoch": 10, "N_marker": true, "N_hostip": "A"}
{"index": {"_index": "yop", "_type": "doc"}}
{"N_timeend_epoch": 10, "N_marker": true, "N_hostip": "B"}
{"index": {"_index": "yop", "_type": "doc"}}
{"N_timeend_epoch": 11, "N_marker": true, "N_hostip": "A"}
{"index": {"_index": "yop", "_type": "doc"}}
{"N_timeend_epoch": 11, "N_marker": true, "N_hostip": "B"}
{"index": {"_index": "yop", "_type": "doc"}}
{"N_timeend_epoch": 10, "N_marker": false, "N_hostip": "A"}
{"index": {"_index": "yop", "_type": "doc"}}
{"N_timeend_epoch": 11, "N_marker": false, "N_hostip": "B"}
{"index": {"_index": "yop", "_type": "doc"}}
{"N_timeend_epoch": 11, "N_marker": false, "N_hostip": "B"}
'

然后我使用Sense来帮助我设置查询(自动完成功能很好地告诉你哪些块在哪些块中被允许,尽管它并不完美)。我使用顶级过滤器,因为它比查询更有效,并且这里不需要查询。另请注意,我的外部should包含两个must子句,每个子句包含两个term过滤器(如果我在映射中没有使用not_analyzed,我需要使用"N_hostip": "a"等。因此,如果文档与should中的两个子句中的任何一个匹配,则会返回它。

curl -XPOST "http://localhost:9200/yop/_search" -d'
{
    "filter": {
        "bool": {
            "should": [
               {
                   "bool": {
                       "must": [
                          { "term": { "N_hostip": "A" } },
                          { "term": { "N_timeend_epoch": 10 } }
                       ]
                   }
               },
               {
                   "bool": {
                       "must": [
                          { "term": { "N_hostip": "B" } },
                          { "term": { "N_timeend_epoch": 10 } }
                       ]
                   }
               }
            ]
        }
    }
}'

这会返回我认为您期望的内容。将它转换为Python代码应该很简单(如果还没有,请确保查看Python client。)