Question

我的查询非常简单：

POST /indexX/document/_search
{
  "query": {
    "match_phrase_prefix": {
      "surname": "grab"
    }
  }
}

带映射：

"surname": {
  "type": "string",
  "analyzer": "polish",
  "copy_to": [
    "full_name"
  ]
}

和索引的定义（我使用Stempel（波兰语）分析Elasticsearch插件）：

POST /indexX
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "synonym" : {
              "type": "synonym",
              "synonyms_path": "analysis/synonyms.txt"
          },
          "polish_stop": {
            "type": "stop",
            "stopwords_path": "analysis/stopwords.txt"
          },  
          "polish_my_stem": {
            "type": "stemmer",
            "rules_path": "analysis/stems.txt"
          }
        },
        "analyzer": {
          "polish_with_synonym": {
            "tokenizer":  "standard",
            "filter": [
              "synonym",
              "lowercase",
              "polish_stop",
              "polish_stem",
              "polish_my_stem"
            ]
          }
        }
      }
    }
  }
}

对于此查询，我得到零结果。当我将短语更改为 GRA 或 GRABA 时，它会返回1个结果（GRABARZ是姓氏）。为什么会这样？

我尝试了max_expansions，其值甚至高达1200，并没有帮助。

Answer 1

乍一看，您的分析仪会截取搜索词（“抓取”）并使其无法使用（“grabić”）。

如果不详细说明如何解决此问题，请考虑在此处摆脱抛光分析仪。我们在谈论人们的名字，而不是“普通的”波兰语。

我在这种情况下看到了不同的技术：多字段搜索，模糊搜索，语音搜索，专用插件。

一些链接： https://www.elastic.co/blog/multi-field-search-just-got-better http://www.basistech.com/fuzzy-search-names-in-elasticsearch/ https://www.found.no/play/gist/6c6434c9c638a8596efa

但我想如果在波兰名称的情况下，对非分析字段的某种前缀查询就足够了......

为什么match_phrase_prefix查询返回错误的结果与不同的短语长度？

1 个答案: