Question

我在使用Python中的elasticsearch_dsl和elasticsearch库搜索嵌套文档时遇到问题。

我可以成功地对文档的顶级（即非嵌套）部分执行搜索，但是我搜索嵌套部分的所有尝试都因某种原因而失败。

我已经搜索过StackOverflow＆amp;用于使用Python搜索嵌套文档的权威指南的网络，但不断出现。

以下是我正在使用的示例文档：

{"username": "nancy",
"codeData": [
 {"code": "B1", "order": "2"}, 
 {"code": "L4", "order": "1"}
  ] 
}

我在索引中有7个文档，我已按如下方式映射：

request_body = {
    "settings" : {
        "number_of_shards": 5,
        "number_of_replicas": 1
    },

    'mappings': {
        'testNesting': {
            'properties': {
                'username': {'type': 'text'},
                'codeData': {'type': 'nested',
                                  'properties' :{
                                      "code" : {"type":"text"},
                                      "order" :{"type":"text"}
                                      }
                                    }
                                 }
            }
        }
    }
es.indices.create(index = "nest-test6", body = request_body)

执行以下搜索工作正常：

s = Search(using = es).query("match", username = "nancy")
response = s.execute()
print(response.to_dict())

现在，我想尝试在“codeData”中搜索代码=“B1”的文档。

我已在此问题的底部列出了我尝试使用的来源。我希望这可以成为人们在尝试使用Python查询嵌套文档时可以参考的权威指南。

这是我到目前为止所尝试的内容：

q = Q("match", code = "L4")
s = Search(using = es, index = "nest-test6").query("nested", path = "codeData", query = q)

上面的结果是传输错误（400，无法创建查询），然后在每个项目之后用一堆\ n列出查询本身。

q = Q("match", **{"codeData.code"" : "L4"})
s = Search(using = es, index = "nest-test6").query("nested", path = "codeData", query = q)

上面导致第1行出现语法错误。

s = Search(using = es, index = "nest-test6").query("nested", path = "lithologyData", query = **Q{"match":{ "lithology":"L4"}})

上述结果也会导致语法错误。

我已经尝试了其他几种方法 - 但改变了我的数据结构，因此在这里列出它们在上述文档的上下文中没有意义。

我不知道如何查询这些嵌套对象。我觉得缺少几条信息：

什么是Q / F关键字，以及如何使用它们？
我知道我必须使用level1.nameOfObjectBeingQueried指定查询术语的路径 - 如果这不是Python库中的合适关键字，我该如何处理它？

如果我遗失了任何其他来源，我真的很感谢有人指着我们！

失败的其他尝试

s1 = Search(using = es).query("match", username = "nancy")
q1 = Q("match", lithologyData__lithology = "L4")
q2 = Q("match", **{"lithologyData.lithology":"L4"})
s2 = Search(using = es, index = "nest-test6").query("nested", path = "lithologyData", query = Q("match",lithologyData__lithology="L4"))
s3 = Search(using = es, index = "nest-test6").query("nested", path = "lithologyData", query = q1)
s4 = Search(using = es, index = "nest-test6").query("nested", path = "lithologyData", query = q2)
response = s1.execute()
response2 = s2.execute()
response3 = s3.execute()
response4 = s4.execute()

回应1：工作

回复2：失败：

TransportError(400, u'search_phase_execution_exception', u'failed to create query: {\n  "nested" : {\n    "query" : {\n      "match" : {\n        "codeData.code" : {\n          "query" : "L4",\n          "operator" : "OR",\n          "prefix_length" : 0,\n          "max_expansions" : 50,\n          "fuzzy_transpositions" : true,\n          "lenient" : false,\n          "zero_terms_query" : "NONE",\n          "auto_generate_synonyms_phrase_query" : true,\n          "boost" : 1.0\n        }\n      }\n    },\n    "path" : "codeData",\n    "ignore_unmapped" : false,\n    "score_mode" : "avg",\n    "boost" : 1.0\n  }\n}')

回应3：失败：

TransportError(400, u'search_phase_execution_exception', u'failed to create query: {\n  "nested" : {\n    "query" : {\n      "match" : {\n        "codeData.code" : {\n          "query" : "L4",\n          "operator" : "OR",\n          "prefix_length" : 0,\n          "max_expansions" : 50,\n          "fuzzy_transpositions" : true,\n          "lenient" : false,\n          "zero_terms_query" : "NONE",\n          "auto_generate_synonyms_phrase_query" : true,\n          "boost" : 1.0\n        }\n      }\n    },\n    "path" : "codeData",\n    "ignore_unmapped" : false,\n    "score_mode" : "avg",\n    "boost" : 1.0\n  }\n}')

回应4：失败： TransportError(400, u'search_phase_execution_exception', u'failed to create query: {\n "nested" : {\n "query" : {\n "match" : {\n "codeData.code" : {\n "query" : "L4",\n "operator" : "OR",\n "prefix_length" : 0,\n "max_expansions" : 50,\n "fuzzy_transpositions" : true,\n "lenient" : false,\n "zero_terms_query" : "NONE",\n "auto_generate_synonyms_phrase_query" : true,\n "boost" : 1.0\n }\n }\n },\n "path" : "codeData",\n "ignore_unmapped" : false,\n "score_mode" : "avg",\n "boost" : 1.0\n }\n}'）

其他资源已审核

ElasticSearch Nested Query Reference

这里的问题是它只描述了如何使用REST API来执行此查询。在关于为什么创建elasticsearch_dsl和elasticsearch Python库的描述中，他们特别提到了直接发送JSON结构的困难。虽然，他们经常引用用户错误的可能性，但我认为还有其他方面我不明白。

Github Issue on ElasticSearch_DSL py

这里他们建议解压缩字典，因为你不能使用“level1.level2”作为参数。然而，创作者同意这远非理想。这个问题来自2014年，根据其他答案，现在似乎有更好的方法，但我找不到细节

ElasticSearch_DSL Python Documentation - 虽然这很有用，但文档中没有嵌套搜索/查询的单个示例。

Answer 1

查询嵌套字段，您似乎有正确的方法：

q = Q("match", codeData__code="L4")
s = Search(using=es, index="nest-test6").query("nested", path="codeData", query=q)

传递给__的kwarg中的任何Q都会在内部翻译为.。或者，你总是可以依赖python kwarg扩展：

q = Q('match', **{"codeData.code": "L4"})

哪个应该也可以正常工作，你的例子里面只有一个额外的"，这就是它被python拒绝的原因。

搜索嵌套文档

失败的其他尝试

其他资源已审核

1 个答案: