Elasticsearch NGram Analyzer-更改查询结果的顺序

时间:2019-01-06 19:49:43

标签: elasticsearch

Elasticsearch查询根据得分更改显示结果

当前查询按以下顺序给出字段标题的结果。

  1. 快速123
  2. 狐狸快速
  3. 快速
  4. Foxes快速快速
  5. 狐狸

不应该 3.反而是第一结果?

此外,Foxs Quick Quick有两次出现的Quick,它在Queried结果中应该有一些偏好。但这即将到来的4点。

索引设置。

 {
 "fundraisers": {
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "fundraisers",
        "creation_date": "1546515635025",
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "filter": [
                "lowercase"
              ],
              "tokenizer": "my_tokenizer"
            },
            "search_analyzer_search": {
              "filter": [
                "lowercase"
              ],
              "tokenizer": "search_tokenizer_search"
            }
          },
          "tokenizer": {
            "my_tokenizer": {
              "token_chars": [
                "letter",
                "digit"
              ],
              "min_gram": "3",
              "type": "edge_ngram",
              "max_gram": "50"
            },
            "search_tokenizer_search": {
              "token_chars": [
                "letter",
                "digit",
                "whitespace"
              ],
              "min_gram": "3",
              "type": "ngram",
              "max_gram": "50"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "mVweO4_sT3Ww00MzdLyavw",
        "version": {
          "created": "6020399"
        }
      }
    }
  }
}

Query 

GET fundraisers/_search?explain=true

{
  "query": {
    "match_phrase": {
      "title": {
        "query": "qui",
        "analyzer": "my_analyzer"
        }
    }
  }
}
Mapping
{
  "fundraisers": {
    "mappings": {
      "fundraisers": {
        "properties": {
          "status": {
            "type": "text"
          },
          "suggest": {
            "type": "completion",
            "analyzer": "simple",
            "preserve_separators": true,
            "preserve_position_increments": true,
            "max_input_length": 50
          },
          "title": {
            "type": "text",
            "analyzer": "my_analyzer"
          },
          "twitterUrl": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "videoLinks": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "zipCode": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

我是否使用match_phrase,搜索分析器和ngrams使其过于复杂,或者有没有更简单的方法来达到预期的结果?

参考: https://www.elastic.co/guide/en/elasticsearch/reference/6.5/query-dsl-match-query.html

1 个答案:

答案 0 :(得分:0)

好吧,首先让我们创建一个最小且可重复的设置:

PUT test
{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "1",
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "filter": [
              "lowercase"
            ],
            "tokenizer": "my_tokenizer"
          },
          "search_analyzer_search": {
            "filter": [
              "lowercase"
            ],
            "tokenizer": "search_tokenizer_search"
          }
        },
        "tokenizer": {
          "my_tokenizer": {
            "token_chars": [
              "letter",
              "digit"
            ],
            "min_gram": "3",
            "type": "edge_ngram",
            "max_gram": "50"
          },
          "search_tokenizer_search": {
            "token_chars": [
              "letter",
              "digit",
              "whitespace"
            ],
            "min_gram": "3",
            "type": "ngram",
            "max_gram": "50"
          }
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

PUT test/_doc/1
{
  "title": "Quick 123"
}
PUT test/_doc/2
{
  "title": "Foxes Quick"
}
PUT test/_doc/3
{
  "title": "Quick"
}
PUT test/_doc/4
{
  "title": "Foxes Quick Quick"
}
PUT test/_doc/5
{
  "title": "Quick Foxes"
}

然后让我们尝试最简单的查询:

GET test/_search
{
  "query": {
    "match": {
      "title": {
        "query": "qui"
        }
    }
  }
}

现在您的订单是:

  1. 快速
  2. Foxes快速快速
  3. 快速123
  4. 狐狸快速
  5. 狐狸

这几乎就是您所期望的,对吗?可能还有其他用例,但此查询未涵盖这些用例,但是IMO您必须使用multi_match并在不同的分析器上进行搜索,因为我不确定Edgegram上的phrase_search是否会感。