Question

这是我所拥有的简化：

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": [
            "letter"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "title": "Quick Foxes" 
}

PUT my_index/_doc/2
{
  "title": "Quick Fuxes" 
}

PUT my_index/_doc/3
{
  "title": "Foxes Quick" 
}

PUT my_index/_doc/4
{
  "title": "Foxes Slow" 
}

我正在尝试搜索Quick Fo来测试自动完成功能：

 GET my_index/_search
    {
      "query": {
        "match": {
          "title": {
            "query": "Quick Fo", 
            "operator": "and"
          }
        }
      }
    }

问题是此查询还会返回我期望“ Quick Foxes”的Foxes Quick

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "title": "Quick Foxes"
        }
      },
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "title": "Foxes Quick"   <<<----- WHY???
        }
      }
    ]
  }
}

我可以进行哪些调整，以便查询经典的“自动完成”，其中“ Quick Fo”肯定不会返回“ Foxes Quick” .....而仅返回“ Quick Foxes”？

----附加信息-----------------------

这对我有用：

PUT my_index1
{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      },
      "analyzer": {
        "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "text": {
          "type": "text",
          "analyzer": "autocomplete", 
          "search_analyzer": "standard" 
        }
      }
    }
  }
}


PUT my_index1/_doc/1
{
  "text": "Quick Brown Fox" 
}

PUT my_index1/_doc/2
{
  "text": "Quick Frown Fox" 
}


PUT my_index1/_doc/3
{
  "text": "Quick Fragile Fox" 
}


GET my_index1/_search
{
  "query": {
    "match": {
      "text": {
        "query": "quick br", 
        "operator": "and"
      }
    }
  }
}

Answer 1

该问题归因于您的搜索分析器 autocomplete_search ，其中您使用的是小写标记器，因此您的搜索词 Quick Fo 将分为两个词，即 quick 和 fo （注意小写），并将它们与在索引文档中使用autocomplete analyzer生成的令牌进行匹配。

现在的标题 Foxes Quick 使用autocomplete analyzer，并且将同时具有 quick 和 fo 标记，因此与搜索匹配术语令牌。

您只需使用_analyzer API，即可检查为您的文档和搜索词生成的令牌，以更好地理解它。

有关如何实现自动完成的信息，请参考官方ES文档https://www.elastic.co/guide/en/elasticsearch/guide/master/_index_time_search_as_you_type.html，他们也使用不同的搜索时间分析器，但是它有一定的局限性，不能解决所有用例（尤其是如果您有像您这样的文档），因此我根据业务需求使用了其他设计来实现它。

希望我很清楚地解释了为什么它要返回您的案例中的第二个文档。

编辑：在您的情况下，IMO Match phrase prefix也会更加有用。

Elasticsearch Ngrams：自动完成的意外行为

1 个答案: