ElasticSearch索引和搜索分析器在一起

时间:2014-12-23 01:39:23

标签: search elasticsearch full-text-search

我刚刚审核了这段视频 - https://www.youtube.com/watch?v=7FLXjgB0PQI,并提出了一个关于ElasticSearch分析器的问题。 我已经阅读了官方文档和其他一些关于分析和分析器的文章,我有点困惑。

例如,我有以下索引配置:

"settings" : {
    "analysis" : {      
      "filter" : {
        "autocomplete" : {
          "type" : "edge_ngram",
          "min_gram" : 1,
          "max_gram" : 20
        }
      },
      "analyzer" : {
        "autocomplete" : {
          "type" : "custom",
          "tokenizer" : "standard",
          "filter" : ["lowercase", "autocomplete"]
        }
      }
    }
  },
  "mappings" : {
    "user" : {
      "properties" : {
        "name" : {
          "type" : "multi_field",
          "fields" : {
            "name" : {
              "type" : "string",
              "analyzer" : "standard"
            },
            "autocomplete" : {
              "type" : "string",
              "index_analyzer" : "autocomplete",
              "search_analyzer" : "standard"
            }
          }
        }
      }
    }
  }

然后我单独执行搜索请求:

{
  "match" : {
    "name.autocomplete" : "john smi"
  }
}

和此:

{
  "match" : {
    "name" : "john smi"
  }
}

如果我理解正确,我必须看到相同的结果,因为在两种情况下ES都应该使用标准分析仪,但我得到了不同的结果。为什么呢?

更新

我在索引中收集了以下名字:“john smith”,“johnathan smith”。

1 个答案:

答案 0 :(得分:0)

当我尝试你所拥有的东西时,我得到了相同的结果,包括所需的"包装"。所以首先我创建了一个索引:

curl -XPOST "http://localhost:9200/test_index/" -d'
{
   "settings": {
      "analysis": {
         "filter": {
            "autocomplete": {
               "type": "edge_ngram",
               "min_gram": 1,
               "max_gram": 20
            }
         },
         "analyzer": {
            "autocomplete": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "lowercase",
                  "autocomplete"
               ]
            }
         }
      }
   },
   "mappings": {
      "user": {
         "properties": {
            "name": {
               "type": "multi_field",
               "fields": {
                  "name": {
                     "type": "string",
                     "analyzer": "standard"
                  },
                  "autocomplete": {
                     "type": "string",
                     "index_analyzer": "autocomplete",
                     "search_analyzer": "standard"
                  }
               }
            }
         }
      }
   }
}'

然后添加文档:

curl -XPUT "http://localhost:9200/test_index/user/1" -d'
{
    "name": "John Smith"
}'

第一次搜索产生文档:

curl -XPOST "http://localhost:9200/test_index/user/_search" -d'
{
   "query": {
      "match": {
         "name.autocomplete": "john smith"
      }
   }
}'
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.2712221,
      "hits": [
         {
            "_index": "test_index",
            "_type": "user",
            "_id": "1",
            "_score": 0.2712221,
            "_source": {
               "name": "John Smith"
            }
         }
      ]
   }
}

第二个也是如此:

curl -XPOST "http://localhost:9200/test_index/user/_search" -d'
{
   "query": {
      "match": {
         "name": "john smith"
      }
   }
}'
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.2712221,
      "hits": [
         {
            "_index": "test_index",
            "_type": "user",
            "_id": "1",
            "_score": 0.2712221,
            "_source": {
               "name": "John Smith"
            }
         }
      ]
   }
}

您的设置还有其他与我在此处所做的不同吗?

以下是我用于此问题的代码:

http://sense.qbox.io/gist/4c8299be570c87f1179f70bfd780a7e9f8d40919