Elasticsearch匹配查询与文档不匹配

时间:2020-06-10 07:21:01

标签: elasticsearch

我正在为地区自动完成功能(一种简单的Google Maps版本)构建搜索器。我的意思是,对于给定的城市(例如费城),我们可以使用其实际名称(Philadelphia),也可以使用备用名称(Philly)。

我使用的查询似乎一切正常:

{
  "query": {
    "bool": {
      "must": {
        "multi_match": {
          "query": "Philly",
          "type": "best_fields",
          "fields": [
            "locality",
            "alternative_names"
          ],
          "operator": "and"
        }
      },
      "filter": {
        "term": {
          "country_code": "US"
        }
      }
    }
  }
}

我发现的问题与西班牙的一个城市有关:马亚达洪达

/ localities_index / localities / 2387

{
  "_index": "localities_index",
  "_type": "localities",
  "_id": "2387",
  "_version": 1,
  "_seq_no": 133,
  "_primary_term": 4,
  "found": true,
  "_source": {
    "country_code": "es",
    "locality": "Majadahonda",
    "alternative_names": []
  }
}

您可以搜索Majadahond并与之匹配(请参见下面的部分名称示例查询)

{
    "query": {
        "match": {
            "locality": {
                "query" : "Majadahond"
            }
        }
    }
}

/ localities_index / localities / 2387 / _explain

{
  "_index": "localities_index",
  "_type": "localities",
  "_id": "2387",
  "matched": true,
  "explanation": {
    "value": 7.568702,
    "description": "weight(locality:majadahond in 112) [PerFieldSimilarity], result of:",
    "details": [
      {
        "value": 7.568702,
        "description": "score(freq=1.0), product of:",
        "details": [
          {
            "value": 2.2,
            "description": "boost",
            "details": []
          },
          {
            "value": 7.8130527,
            "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
            "details": [
              {
                "value": 1,
                "description": "n, number of documents containing term",
                "details": []
              },
              {
                "value": 3708,
                "description": "N, total number of documents with field",
                "details": []
              }
            ]
          },
          {
            "value": 0.44032967,
            "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
            "details": [
              {
                "value": 1.0,
                "description": "freq, occurrences of term within document",
                "details": []
              },
              {
                "value": 1.2,
                "description": "k1, term saturation parameter",
                "details": []
              },
              {
                "value": 0.75,
                "description": "b, length normalization parameter",
                "details": []
              },
              {
                "value": 9.0,
                "description": "dl, length of field",
                "details": []
              },
              {
                "value": 8.341694,
                "description": "avgdl, average length of field",
                "details": []
              }
            ]
          }
        ]
      }
    ]
  }
}

,但如果使用其全名,则不会。我找不到解释。所以我尝试了一个更简单的查询,只是舍弃了其他问题:

{
    "query": {
        "match": {
            "locality": {
                "query" : "Majadahonda"
            }
        }
    }
}

/ localities_index / localities / 2387 / _explain

{
  "_index": "localities_index",
  "_type": "localities",
  "_id": "2387",
  "matched": false,
  "explanation": {
    "value": 0.0,
    "description": "no matching term",
    "details": []
  }
}

/ localities_index / _settings

{
  "localities_index": {
    "settings": {
      "index": {
        "number_of_shards": "1",
        "blocks": {
          "read_only_allow_delete": "false"
        },
        "provided_name": "localities_index",
        "creation_date": "1591341622712",
        "analysis": {
          "analyzer": {
            "autocomplete": {
              "filter": [
                "lowercase",
                "asciifolding"
              ],
              "tokenizer": "autocomplete"
            },
            "autocomplete_search": {
              "filter": [
                "lowercase",
                "asciifolding"
              ],
              "tokenizer": "lowercase"
            }
          },
          "tokenizer": {
            "autocomplete": {
              "token_chars": [
                "letter",
                "digit"
              ],
              "min_gram": "2",
              "type": "edge_ngram",
              "max_gram": "15"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "zgdfv5bBS8-XkK5Aoh9XZA",
        "version": {
          "created": "7040099"
        }
      }
    }
  }
}

/ localities_index / _mapping

{
  "localities_index": {
    "mappings": {
      "properties": {
        "alternative_names": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "country_code": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "locality": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

1 个答案:

答案 0 :(得分:1)

您的egde_gram令牌生成器限制为10个字符,因此最后一个令牌为majadahond(长度为10),而没有结尾的a

enter image description here

因此,您可能希望增加该max_gram的设置。