Question

我希望能够将多字搜索与多个字段匹配，其中搜索的每个字都包含在任何字段中，任意组合中。我想要避免使用 query_string。

curl -X POST "http://localhost:9200/index/document/1" -d '{"id":1,"firstname":"john","middlename":"clark","lastname":"smith"}'
curl -X POST "http://localhost:9200/index/document/2" -d '{"id":2,"firstname":"john","middlename":"paladini","lastname":"miranda"}'

我希望搜索“John Smith”仅匹配文档1.以下查询执行我需要的操作，但我宁愿避免使用query_string，以防用户传递“OR”，“AND”和任何其他先进的参数。

curl -X GET 'http://localhost:9200/index/_search?per_page=10&pretty' -d '{
  "query": {
    "query_string": {
      "query": "john smith",
      "default_operator": "AND",
      "fields": [
        "firstname",
        "lastname",
        "middlename"
      ]
    }
  }
}'

Answer 1

您正在寻找的是multi-match query，但它的表现并不像您希望的那样。

比较multi_match与query_string的{{3}}输出。

multi_match（与运营商and）将确保所有字词至少存在于一个字段中：

curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'  -d '
{
   "multi_match" : {
      "operator" : "and",
      "fields" : [
         "firstname",
         "lastname"
      ],
      "query" : "john smith"
   }
}
'

# {
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 1,
#       "total" : 1
#    },
#    "explanations" : [
#       {
#          "index" : "test",
#          "explanation" : "((+lastname:john +lastname:smith) | (+firstname:john +firstname:smith))",
#          "valid" : true
#       }
#    ],
#    "valid" : true
# }

虽然query_string（使用default_operator AND）会检查至少一个字段中是否存在EACH字词：

curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'  -d '
{
   "query_string" : {
      "fields" : [
         "firstname",
         "lastname"
      ],
      "query" : "john smith",
      "default_operator" : "AND"
   }
}
'

# {
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 1,
#       "total" : 1
#    },
#    "explanations" : [
#       {
#          "index" : "test",
#          "explanation" : "+(firstname:john | lastname:john) +(firstname:smith | lastname:smith)",
#          "valid" : true
#       }
#    ],
#    "valid" : true
# }

所以你有几个选择来实现你的目标：

在使用query_string
预先解析搜索字词以提取每个字词，然后为每个字词生成multi_match个查询
在映射中使用index_name为名称字段将其数据编入单个字段，然后您可以将其用于搜索。（就像您自己的自定义all字段）：

如下：

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
   "mappings" : {
      "test" : {
         "properties" : {
            "firstname" : {
               "index_name" : "name",
               "type" : "string"
            },
            "lastname" : {
               "index_name" : "name",
               "type" : "string"
            }
         }
      }
   }
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '
{
   "firstname" : "john",
   "lastname" : "smith"
}
'

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "name" : {
            "operator" : "and",
            "query" : "john smith"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "firstname" : "john",
#                "lastname" : "smith"
#             },
#             "_score" : 0.2712221,
#             "_index" : "test",
#             "_id" : "VJFU_RWbRNaeHF9wNM8fRA",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.2712221,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 33
# }

但请注意，firstname和lastname不再可以独立搜索。这两个字段的数据已编入索引name。

您可以将validate与path参数一起使用，以便可以单独和一起搜索它们，如下所示：

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
   "mappings" : {
      "test" : {
         "properties" : {
            "firstname" : {
               "fields" : {
                  "firstname" : {
                     "type" : "string"
                  },
                  "any_name" : {
                     "type" : "string"
                  }
               },
               "path" : "just_name",
               "type" : "multi_field"
            },
            "lastname" : {
               "fields" : {
                  "any_name" : {
                     "type" : "string"
                  },
                  "lastname" : {
                     "type" : "string"
                  }
               },
               "path" : "just_name",
               "type" : "multi_field"
            }
         }
      }
   }
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '
{
   "firstname" : "john",
   "lastname" : "smith"
}
'

搜索any_name字段有效：

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "any_name" : {
            "operator" : "and",
            "query" : "john smith"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "firstname" : "john",
#                "lastname" : "smith"
#             },
#             "_score" : 0.2712221,
#             "_index" : "test",
#             "_id" : "Xf9qqKt0TpCuyLWioNh-iQ",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.2712221,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 11
# }

在firstname中搜索john AND smith不起作用：

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "firstname" : {
            "operator" : "and",
            "query" : "john smith"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [],
#       "max_score" : null,
#       "total" : 0
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 2
# }

但只搜索firstname john只能正常工作：

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "firstname" : {
            "operator" : "and",
            "query" : "john"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "firstname" : "john",
#                "lastname" : "smith"
#             },
#             "_score" : 0.30685282,
#             "_index" : "test",
#             "_id" : "Xf9qqKt0TpCuyLWioNh-iQ",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.30685282,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 3
# }

Answer 2

如果用户传递“OR”，“AND”和任何其他高级参数，我宁愿避免使用query_string。

根据我的经验，使用反斜杠转义特殊字符是一种简单有效的解决方案。该列表可在文档http://lucene.apache.org/core/4_5_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description中找到，另外还有AND / OR / NOT / TO。

Answer 3

我认为“匹配”查询是您正在寻找的：

“匹配查询系列不会通过”查询解析“过程。它不支持字段名称前缀，通配符或其他”高级“功能。因此，它失败的可能性非常小/不存在，当它只是分析和运行该文本作为查询行为（这通常是文本搜索框所做的）时，它提供了一个很好的行为“

http://www.elasticsearch.org/guide/reference/query-dsl/match-query.html

Answer 4

现在您可以在multi_match中使用cross_fields类型

GET /_validate/query?explain
{
    "query": {
        "multi_match": {
            "query":       "peter smith",
            "type":        "cross_fields", 
            "operator":    "and",
            "fields":      [ "firstname", "lastname", "middlename" ]
        }
    }
}

跨领域采用以术语为中心的方法。它将所有字段视为一个大字段，并在任何字段中查找每个术语。

需要注意的一件事是，如果您希望它发挥最佳作用，则所有被分析的字段都应使用相同的分析仪（标准，英文等）：

为使cross_fields查询类型最佳工作，所有字段均应   具有相同的分析仪。共享分析器的字段被分组   一起作为混合字段。

如果您包含具有不同分析链的字段，它们将是   以与best_fields相同的方式添加到查询中。例如，   如果我们在前面的查询中添加了标题字段（假设它使用了   不同的分析仪），说明如下：

（+ title：peter + title：smith）（+ blended（“ peter”，字段：[first_name，   last_name]）+ blended（“ smith”，栏位：[first_name，last_name]））

多字段，多字，匹配没有query_string

4 个答案: