Question

所以我有一个像这样创建的弹性搜索索引：

table{
     table-layout: fixed;
}

在查询名为“ian＆＃39;”的人时，我得到两个结果

curl -XPUT 'http://localhost:9200/person' -d '{
    "settings": {
        "number_of_shards": 1,
        "analysis": {
            "filter": {
                "autocomplete_filter": {
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter"
                    ]
                }
            }
        }
    }
}'

但是在查询字母curl -XGET http://localhost:9200/person/_search -d '{ "query": { "match": { "_all": "ian" } } }’时，我应该得到尽可能多的结果，但我没有得到任何结果：

ia

我的curl -XGET http://localhost:9200/person/_search -d '{ "query": { "match": { "_all": "ia" } } }’过滤器设置是否有用？我该如何解决这个问题？

编辑：澄清一下，我希望我的插入声明能够看出这一行

edge_ngram

插入后，我希望对所有字段进行edge_ngram分析，以便我可以通过任何字段搜索部分字符串并返回此结果。

Answer 1

如果您只想将分析仪用于所有类型和所有属性（除非另有说明），您只需要设置＆＃34;默认＆＃34;分析器的索引。我在ES文档中找不到这些内容（它们并不总是非常用户友好），但这是一个例子。我使用ES 1.5，虽然我认为不重要。

PUT /person
{
   "settings": {
      "number_of_shards": 1,
      "analysis": {
         "filter": {
            "autocomplete_filter": {
               "type": "edge_ngram",
               "min_gram": 1,
               "max_gram": 20
            }
         },
         "analyzer": {
            "default": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "lowercase",
                  "autocomplete_filter"
               ]
            }
         }
      }
   }
}

然后我将文档编入索引并运行您的查询，它运行良好：

POST /person/doc/_bulk
{"index":{"_id":1}}
{"name":"Ian"}
{"index":{"_id":2}}
{"name":"Bob Smith"}

POST /person/_search
{
   "query": {
      "match": {
         "_all": "ia"
      }
   }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1.4142135,
      "hits": [
         {
            "_index": "person",
            "_type": "doc",
            "_id": "1",
            "_score": 1.4142135,
            "_source": {
               "name": "Ian"
            }
         }
      ]
   }
}

以下是代码：

http://sense.qbox.io/gist/4e2114aafc4f3c507b4f23da8bb83f3ab00e2288

Answer 2

_all字段将使用默认分析器＆＃34;标准＆＃34;除非你为它指定一个。所以_all字段中的标记不是edge_ngram。因此没有搜索＆＃34; ia＆＃34;的结果。您通常希望避免使用_all字段进行部分匹配搜索，因为它可能会产生意外或混乱的结果。

如果您仍然需要使用_all字段，请将分析器指定为＆＃34; autocomplete＆＃34;特别是它。

Answer 3

您没有指定使用分析仪的任何类型。所以你定义了分析仪，但没有使用它。将文档保存为新类型时，将隐式定义映射，并且将使用standard analyzer，这不会创建部分字词，因此您搜索＆＃34; ia＆＃34 ;并不匹配任何东西。

处理此问题的一种方法是明确定义您的类型，并指定要在映射中使用的分析器。这是一个例子，索引名称是＆＃34; person＆＃34; （和你的一样），类型名称是＆＃34; doc＆＃34;，带有属性＆＃34; name＆＃34;使用您的分析器进行索引（但不用于搜索）：

PUT /person
{
    "settings": {
        "number_of_shards": 1,
        "analysis": {
            "filter": {
                "autocomplete_filter": {
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter"
                    ]
                }
            }
        }
    },
    "mappings": {
        "doc":{
            "properties": {
                "name": {
                    "type": "string",
                    "index_analyzer": "autocomplete",
                    "search_analyzer": "standard"
                }
            }
        }
    }
}

为了测试它，我添加了几个文档：

POST /person/doc/_bulk
{"index":{"_id":1}}
{"name":"Ian"}
{"index":{"_id":2}}
{"name":"Bob Smith"}

然后针对"name"字段运行匹配查询：

POST /person/_search
{
   "query": {
      "match": {
         "name": "ia"
      }
   }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "person",
            "_type": "doc",
            "_id": "1",
            "_score": 1,
            "_source": {
               "name": "Ian"
            }
         }
      ]
   }
}

以下是我用来测试一些不同内容的一些代码，包括使用"_all"字段以使原始查询有效：

http://sense.qbox.io/gist/61df5d17343651884c9422198b6a6bc00a6acb04

带有ngram索引的Elasticsearch没有找到部分匹配

3 个答案: