Question

问题

我正在使用ElasticSearch 6.2.3开发自动完成程序。我希望使用以下优先级对我的查询结果（带有名称字段的页面列表）进行排序：

“名称”（前缀查询）
“名称”（术语查询）中的任何其他确切（全字）匹配
模糊匹配（目前使用ngram标记生成器在名称的不同字段上完成...所以我假设与我的问题无关，但我也想在名字字段中应用它）

我尝试的解决方案

我将使用包含三个查询的Bool / Should查询（对应于上面的三个优先级），使用boost来定义相对重要性。

我遇到的问题是使用前缀查询 - 尽管我的搜索分析器具有小写过滤器，但它似乎并没有降低搜索查询的范围。例如，下面的查询为'harry'返回“Harry Potter”，但为'Harry'返回零结果：

{ "query": { "prefix": { "Name.raw" : "Harry" } } }

我已经使用_analyze API验证了我的两个分析器确实将“Harry”文本小写为“harry”。我哪里错了？

从ES文档中我了解到，我需要以两种不同的方式分析Name字段，以便能够同时使用Prefix和Term查询：

使用“关键字”标记生成器启用Prefix查询（我已将此应用于.raw字段）
使用标准分析器启用Term（我已在名称字段中应用此功能）

我检查过this one等重复问题，但答案没有帮助

我的地图和设置位于

ES索引映射

{
    "myIndex": {
        "mappings": {
            "pages": {
                "properties": {
                    "Id": {},
                    "Name": {
                        "type": "text",
                        "fields": {
                            "raw": {
                                "type": "text",
                                "analyzer": "keywordAnalyzer",
                                "search_analyzer": "pageSearchAnalyzer"
                            }
                        },
                    "analyzer": "pageSearchAnalyzer"
                    },
                    "Tokens": {}, // Other fields not important for this question
                }
            }
        }
    }
}

ES索引设置

{
    "myIndex": {
        "settings": {
            "index": {
                "analysis": {
                    "filter": {
                        "ngram": {
                            "type": "edgeNGram",
                            "min_gram": "2",
                            "max_gram": "15"
                        }
                    },
                    "analyzer": {
                        "keywordAnalyzer": {
                            "filter": [
                                "trim",
                                "lowercase",
                                "asciifolding"
                            ],
                            "type": "custom",
                            "tokenizer": "keyword"
                        },
                        "pageSearchAnalyzer": {
                            "filter": [
                                "trim",
                                "lowercase",
                                "asciifolding"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        },
                        "pageIndexAnalyzer": {
                            "filter": [
                                "trim",
                                "lowercase",
                                "asciifolding",
                                "ngram"
                                ],
                            "type": "custom",
                            "tokenizer": "standard"
                        }
                    }
                },
                "number_of_replicas": "1",
                "uuid": "l2AXoENGRqafm42OSWWTAg",
                "version": {}
            }
        }
    }
}

Answer 1

前缀查询不会分析搜索词，因此传递给它的文本会绕过用作搜索分析器的任何内容（在您的情况下，配置为search_analyzer: pageSearchAnalyzer）并将Harry评估为 - 直接针对关键字标记化，自定义过滤的harry potter，这是在索引时应用keywordAnalyzer的结果。

在这种情况下，你需要做一些不同的事情：

由于您在字段上使用lowercase过滤器，因此您可以始终在前缀查询中使用小写字词（如果需要，使用应用程序端小写）
针对match分析的字段而不是edge_ngram

prefix

PUT my_index
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "ngram": {
            "type": "edgeNGram",
            "min_gram": "2",
            "max_gram": "15"
          }
        },
        "analyzer": {
          "pageIndexAnalyzer": {
            "filter": [
              "trim",
              "lowercase",
              "asciifolding",
              "ngram"
            ],
            "type": "custom",
            "tokenizer": "keyword"
          }
        }
      }
    }
  },
  "mappings": {
    "pages": {
      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "ngram": {
              "type": "text",
              "analyzer": "pageIndexAnalyzer",
              "search_analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

以下是后者的一个例子：

1）创建索引w / ngram分析器和（推荐）标准搜索分析器

POST my_index/pages/_bulk
{"index":{}}
{"name":"Harry Potter"}
{"index":{}}
{"name":"Hermione Granger"}

2）索引一些示例文档

POST my_index/pages/_search
{
  "query": {
    "match": {
      "query": "Har",
      "operator": "and"
    }
  }
}

3）针对ngram字段运行匹配查询

Answer 2

我认为最好使用match_phrase_prefix查询而不使用后缀.keyword。在此处https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase-prefix.html

检查文档

尽管在索引和搜索上使用小写过滤器，为什么我的弹性搜索前缀查询区分大小写？

问题

我尝试的解决方案

2 个答案: